:doc:`BedrockAgentCoreControl <../../bedrock-agentcore-control>` / Client / create_evaluator

****************
create_evaluator
****************



.. py:method:: BedrockAgentCoreControl.Client.create_evaluator(**kwargs)

  

  Creates a custom evaluator for agent quality assessment. Custom evaluators use LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings to evaluate agent performance at tool call, trace, or session levels.

  

  See also: `AWS API Documentation <https://docs.aws.amazon.com/goto/WebAPI/bedrock-agentcore-control-2023-06-05/CreateEvaluator>`_  


  **Request Syntax**
  ::

    response = client.create_evaluator(
        clientToken='string',
        evaluatorName='string',
        description='string',
        evaluatorConfig={
            'llmAsAJudge': {
                'instructions': 'string',
                'ratingScale': {
                    'numerical': [
                        {
                            'definition': 'string',
                            'value': 123.0,
                            'label': 'string'
                        },
                    ],
                    'categorical': [
                        {
                            'definition': 'string',
                            'label': 'string'
                        },
                    ]
                },
                'modelConfig': {
                    'bedrockEvaluatorModelConfig': {
                        'modelId': 'string',
                        'inferenceConfig': {
                            'maxTokens': 123,
                            'temperature': ...,
                            'topP': ...,
                            'stopSequences': [
                                'string',
                            ]
                        },
                        'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                    }
                }
            }
        },
        level='TOOL_CALL'|'TRACE'|'SESSION',
        tags={
            'string': 'string'
        }
    )
    
  :type clientToken: string
  :param clientToken: 

    A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don't specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn't return an error. For more information, see `Ensuring idempotency <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Run_Instance_Idempotency.html>`__.

    This field is autopopulated if not provided.

  
  :type evaluatorName: string
  :param evaluatorName: **[REQUIRED]** 

    The name of the evaluator. Must be unique within your account.

    

  
  :type description: string
  :param description: 

    The description of the evaluator that explains its purpose and evaluation criteria.

    

  
  :type evaluatorConfig: dict
  :param evaluatorConfig: **[REQUIRED]** 

    The configuration for the evaluator, including LLM-as-a-Judge settings with instructions, rating scale, and model configuration.

    .. note::    This is a Tagged Union structure. Only one of the     following top level keys can be set: ``llmAsAJudge``. 

  
    - **llmAsAJudge** *(dict) --* 

      The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.

      

    
      - **instructions** *(string) --* **[REQUIRED]** 

        The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.

        

      
      - **ratingScale** *(dict) --* **[REQUIRED]** 

        The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.

        .. note::    This is a Tagged Union structure. Only one of the     following top level keys can be set: ``numerical``, ``categorical``. 

      
        - **numerical** *(list) --* 

          The numerical rating scale with defined score values and descriptions for quantitative evaluation.

          

        
          - *(dict) --* 

            The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.

            

          
            - **definition** *(string) --* **[REQUIRED]** 

              The description that explains what this numerical rating represents and when it should be used.

              

            
            - **value** *(float) --* **[REQUIRED]** 

              The numerical value for this rating scale option.

              

            
            - **label** *(string) --* **[REQUIRED]** 

              The label or name that describes this numerical rating option.

              

            
          
      
        - **categorical** *(list) --* 

          The categorical rating scale with named categories and definitions for qualitative evaluation.

          

        
          - *(dict) --* 

            The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.

            

          
            - **definition** *(string) --* **[REQUIRED]** 

              The description that explains what this categorical rating represents and when it should be used.

              

            
            - **label** *(string) --* **[REQUIRED]** 

              The label or name of this categorical rating option.

              

            
          
      
      
      - **modelConfig** *(dict) --* **[REQUIRED]** 

        The model configuration that specifies which foundation model to use and how to configure it for evaluation.

        .. note::    This is a Tagged Union structure. Only one of the     following top level keys can be set: ``bedrockEvaluatorModelConfig``. 

      
        - **bedrockEvaluatorModelConfig** *(dict) --* 

          The Amazon Bedrock model configuration for evaluation.

          

        
          - **modelId** *(string) --* **[REQUIRED]** 

            The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.

            

          
          - **inferenceConfig** *(dict) --* 

            The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.

            

          
            - **maxTokens** *(integer) --* 

              The maximum number of tokens to generate in the model response during evaluation.

              

            
            - **temperature** *(float) --* 

              The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.

              

            
            - **topP** *(float) --* 

              The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.

              

            
            - **stopSequences** *(list) --* 

              The list of sequences that will cause the model to stop generating tokens when encountered.

              

            
              - *(string) --* 

              
          
          
          - **additionalModelRequestFields** (:ref:`document<document>`) -- 

            Additional model-specific request fields to customize model behavior beyond the standard inference configuration.

            

          
        
      
    
  
  :type level: string
  :param level: **[REQUIRED]** 

    The evaluation level that determines the scope of evaluation. Valid values are ``TOOL_CALL`` for individual tool invocations, ``TRACE`` for single request-response interactions, or ``SESSION`` for entire conversation sessions.

    

  
  :type tags: dict
  :param tags: 

    A map of tag keys and values to assign to an AgentCore Evaluator. Tags enable you to categorize your resources in different ways, for example, by purpose, owner, or environment.

    

  
    - *(string) --* 

    
      - *(string) --* 

      


  
  :rtype: dict
  :returns: 
    
    **Response Syntax**

    
    ::

      {
          'evaluatorArn': 'string',
          'evaluatorId': 'string',
          'createdAt': datetime(2015, 1, 1),
          'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
      }
      
    **Response Structure**

    

    - *(dict) --* 
      

      - **evaluatorArn** *(string) --* 

        The Amazon Resource Name (ARN) of the created evaluator.

        
      

      - **evaluatorId** *(string) --* 

        The unique identifier of the created evaluator.

        
      

      - **createdAt** *(datetime) --* 

        The timestamp when the evaluator was created.

        
      

      - **status** *(string) --* 

        The status of the evaluator creation operation.

        
  
  **Exceptions**
  
  *   :py:class:`BedrockAgentCoreControl.Client.exceptions.ServiceQuotaExceededException`

  
  *   :py:class:`BedrockAgentCoreControl.Client.exceptions.ValidationException`

  
  *   :py:class:`BedrockAgentCoreControl.Client.exceptions.AccessDeniedException`

  
  *   :py:class:`BedrockAgentCoreControl.Client.exceptions.ConflictException`

  
  *   :py:class:`BedrockAgentCoreControl.Client.exceptions.ThrottlingException`

  
  *   :py:class:`BedrockAgentCoreControl.Client.exceptions.InternalServerException`

  