:doc:`AgentsforBedrock <../../bedrock-agent>` / Client / ingest_knowledge_base_documents

*******************************
ingest_knowledge_base_documents
*******************************



.. py:method:: AgentsforBedrock.Client.ingest_knowledge_base_documents(**kwargs)

  

  Ingests documents directly into the knowledge base that is connected to the data source. The ``dataSourceType`` specified in the content for each document must match the type of the data source that you specify in the header. For more information, see `Ingest changes directly into a knowledge base <https://docs.aws.amazon.com/bedrock/latest/userguide/kb-direct-ingestion.html>`__ in the Amazon Bedrock User Guide.

  

  See also: `AWS API Documentation <https://docs.aws.amazon.com/goto/WebAPI/bedrock-agent-2023-06-05/IngestKnowledgeBaseDocuments>`_  


  **Request Syntax**
  ::

    response = client.ingest_knowledge_base_documents(
        knowledgeBaseId='string',
        dataSourceId='string',
        clientToken='string',
        documents=[
            {
                'metadata': {
                    'type': 'IN_LINE_ATTRIBUTE'|'S3_LOCATION',
                    'inlineAttributes': [
                        {
                            'key': 'string',
                            'value': {
                                'type': 'BOOLEAN'|'NUMBER'|'STRING'|'STRING_LIST',
                                'numberValue': 123.0,
                                'booleanValue': True|False,
                                'stringValue': 'string',
                                'stringListValue': [
                                    'string',
                                ]
                            }
                        },
                    ],
                    's3Location': {
                        'uri': 'string',
                        'bucketOwnerAccountId': 'string'
                    }
                },
                'content': {
                    'dataSourceType': 'CUSTOM'|'S3',
                    'custom': {
                        'customDocumentIdentifier': {
                            'id': 'string'
                        },
                        'sourceType': 'IN_LINE'|'S3_LOCATION',
                        's3Location': {
                            'uri': 'string',
                            'bucketOwnerAccountId': 'string'
                        },
                        'inlineContent': {
                            'type': 'BYTE'|'TEXT',
                            'byteContent': {
                                'mimeType': 'string',
                                'data': b'bytes'
                            },
                            'textContent': {
                                'data': 'string'
                            }
                        }
                    },
                    's3': {
                        's3Location': {
                            'uri': 'string'
                        }
                    }
                }
            },
        ]
    )
    
  :type knowledgeBaseId: string
  :param knowledgeBaseId: **[REQUIRED]** 

    The unique identifier of the knowledge base to ingest the documents into.

    

  
  :type dataSourceId: string
  :param dataSourceId: **[REQUIRED]** 

    The unique identifier of the data source connected to the knowledge base that you're adding documents to.

    

  
  :type clientToken: string
  :param clientToken: 

    A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see `Ensuring idempotency <https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Run_Instance_Idempotency.html>`__.

    This field is autopopulated if not provided.

  
  :type documents: list
  :param documents: **[REQUIRED]** 

    A list of objects, each of which contains information about the documents to add.

    

  
    - *(dict) --* 

      Contains information about a document to ingest into a knowledge base and metadata to associate with it.

      

    
      - **metadata** *(dict) --* 

        Contains the metadata to associate with the document.

        

      
        - **type** *(string) --* **[REQUIRED]** 

          The type of the source source from which to add metadata.

          

        
        - **inlineAttributes** *(list) --* 

          An array of objects, each of which defines a metadata attribute to associate with the content to ingest. You define the attributes inline.

          

        
          - *(dict) --* 

            Contains information about a metadata attribute.

            

          
            - **key** *(string) --* **[REQUIRED]** 

              The key of the metadata attribute.

              

            
            - **value** *(dict) --* **[REQUIRED]** 

              Contains the value of the metadata attribute.

              

            
              - **type** *(string) --* **[REQUIRED]** 

                The type of the metadata attribute.

                

              
              - **numberValue** *(float) --* 

                The value of the numeric metadata attribute.

                

              
              - **booleanValue** *(boolean) --* 

                The value of the Boolean metadata attribute.

                

              
              - **stringValue** *(string) --* 

                The value of the string metadata attribute.

                

              
              - **stringListValue** *(list) --* 

                An array of strings that define the value of the metadata attribute.

                

              
                - *(string) --* 

                
            
            
          
      
        - **s3Location** *(dict) --* 

          The Amazon S3 location of the file containing metadata to associate with the content to ingest.

          

        
          - **uri** *(string) --* **[REQUIRED]** 

            The S3 URI of the file containing the content to ingest.

            

          
          - **bucketOwnerAccountId** *(string) --* 

            The identifier of the Amazon Web Services account that owns the S3 bucket containing the content to ingest.

            

          
        
      
      - **content** *(dict) --* **[REQUIRED]** 

        Contains the content of the document.

        

      
        - **dataSourceType** *(string) --* **[REQUIRED]** 

          The type of data source that is connected to the knowledge base to which to ingest this document.

          

        
        - **custom** *(dict) --* 

          Contains information about the content to ingest into a knowledge base connected to a custom data source.

          

        
          - **customDocumentIdentifier** *(dict) --* **[REQUIRED]** 

            A unique identifier for the document.

            

          
            - **id** *(string) --* **[REQUIRED]** 

              The identifier of the document to ingest into a custom data source.

              

            
          
          - **sourceType** *(string) --* **[REQUIRED]** 

            The source of the data to ingest.

            

          
          - **s3Location** *(dict) --* 

            Contains information about the Amazon S3 location of the file from which to ingest data.

            

          
            - **uri** *(string) --* **[REQUIRED]** 

              The S3 URI of the file containing the content to ingest.

              

            
            - **bucketOwnerAccountId** *(string) --* 

              The identifier of the Amazon Web Services account that owns the S3 bucket containing the content to ingest.

              

            
          
          - **inlineContent** *(dict) --* 

            Contains information about content defined inline to ingest into a knowledge base.

            

          
            - **type** *(string) --* **[REQUIRED]** 

              The type of inline content to define.

              

            
            - **byteContent** *(dict) --* 

              Contains information about content defined inline in bytes.

              

            
              - **mimeType** *(string) --* **[REQUIRED]** 

                The MIME type of the content. For a list of MIME types, see `Media Types <https://www.iana.org/assignments/media-types/media-types.xhtml>`__. The following MIME types are supported:

                 

                
                * text/plain
                 
                * text/html
                 
                * text/csv
                 
                * text/vtt
                 
                * message/rfc822
                 
                * application/xhtml+xml
                 
                * application/pdf
                 
                * application/msword
                 
                * application/vnd.ms-word.document.macroenabled.12
                 
                * application/vnd.ms-word.template.macroenabled.12
                 
                * application/vnd.ms-excel
                 
                * application/vnd.ms-excel.addin.macroenabled.12
                 
                * application/vnd.ms-excel.sheet.macroenabled.12
                 
                * application/vnd.ms-excel.template.macroenabled.12
                 
                * application/vnd.ms-excel.sheet.binary.macroenabled.12
                 
                * application/vnd.ms-spreadsheetml
                 
                * application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
                 
                * application/vnd.openxmlformats-officedocument.spreadsheetml.template
                 
                * application/vnd.openxmlformats-officedocument.wordprocessingml.document
                 
                * application/vnd.openxmlformats-officedocument.wordprocessingml.template
                

                

              
              - **data** *(bytes) --* **[REQUIRED]** 

                The base64-encoded string of the content.

                

              
            
            - **textContent** *(dict) --* 

              Contains information about content defined inline in text.

              

            
              - **data** *(string) --* **[REQUIRED]** 

                The text of the content.

                

              
            
          
        
        - **s3** *(dict) --* 

          Contains information about the content to ingest into a knowledge base connected to an Amazon S3 data source

          

        
          - **s3Location** *(dict) --* **[REQUIRED]** 

            The S3 location of the file containing the content to ingest.

            

          
            - **uri** *(string) --* **[REQUIRED]** 

              The location's URI. For example, ``s3://my-bucket/chunk-processor/``.

              

            
          
        
      
    

  
  :rtype: dict
  :returns: 
    
    **Response Syntax**

    
    ::

      {
          'documentDetails': [
              {
                  'knowledgeBaseId': 'string',
                  'dataSourceId': 'string',
                  'status': 'INDEXED'|'PARTIALLY_INDEXED'|'PENDING'|'FAILED'|'METADATA_PARTIALLY_INDEXED'|'METADATA_UPDATE_FAILED'|'IGNORED'|'NOT_FOUND'|'STARTING'|'IN_PROGRESS'|'DELETING'|'DELETE_IN_PROGRESS',
                  'identifier': {
                      'dataSourceType': 'CUSTOM'|'S3',
                      's3': {
                          'uri': 'string'
                      },
                      'custom': {
                          'id': 'string'
                      }
                  },
                  'statusReason': 'string',
                  'updatedAt': datetime(2015, 1, 1)
              },
          ]
      }
      
    **Response Structure**

    

    - *(dict) --* 
      

      - **documentDetails** *(list) --* 

        A list of objects, each of which contains information about the documents that were ingested.

        
        

        - *(dict) --* 

          Contains the details for a document that was ingested or deleted.

          
          

          - **knowledgeBaseId** *(string) --* 

            The identifier of the knowledge base that the document was ingested into or deleted from.

            
          

          - **dataSourceId** *(string) --* 

            The identifier of the data source connected to the knowledge base that the document was ingested into or deleted from.

            
          

          - **status** *(string) --* 

            The ingestion status of the document. The following statuses are possible:

             

            
            * STARTING – You submitted the ingestion job containing the document.
             
            * PENDING – The document is waiting to be ingested.
             
            * IN_PROGRESS – The document is being ingested.
             
            * INDEXED – The document was successfully indexed.
             
            * PARTIALLY_INDEXED – The document was partially indexed.
             
            * METADATA_PARTIALLY_INDEXED – You submitted metadata for an existing document and it was partially indexed.
             
            * METADATA_UPDATE_FAILED – You submitted a metadata update for an existing document but it failed.
             
            * FAILED – The document failed to be ingested.
             
            * NOT_FOUND – The document wasn't found.
             
            * IGNORED – The document was ignored during ingestion.
             
            * DELETING – You submitted the delete job containing the document.
             
            * DELETE_IN_PROGRESS – The document is being deleted.
            

            
          

          - **identifier** *(dict) --* 

            Contains information that identifies the document.

            
            

            - **dataSourceType** *(string) --* 

              The type of data source connected to the knowledge base that contains the document.

              
            

            - **s3** *(dict) --* 

              Contains information that identifies the document in an S3 data source.

              
              

              - **uri** *(string) --* 

                The location's URI. For example, ``s3://my-bucket/chunk-processor/``.

                
          
            

            - **custom** *(dict) --* 

              Contains information that identifies the document in a custom data source.

              
              

              - **id** *(string) --* 

                The identifier of the document to ingest into a custom data source.

                
          
        
          

          - **statusReason** *(string) --* 

            The reason for the status. Appears alongside the status ``IGNORED``.

            
          

          - **updatedAt** *(datetime) --* 

            The date and time at which the document was last updated.

            
      
    
  
  **Exceptions**
  
  *   :py:class:`AgentsforBedrock.Client.exceptions.ThrottlingException`

  
  *   :py:class:`AgentsforBedrock.Client.exceptions.AccessDeniedException`

  
  *   :py:class:`AgentsforBedrock.Client.exceptions.ValidationException`

  
  *   :py:class:`AgentsforBedrock.Client.exceptions.InternalServerException`

  
  *   :py:class:`AgentsforBedrock.Client.exceptions.ResourceNotFoundException`

  
  *   :py:class:`AgentsforBedrock.Client.exceptions.ServiceQuotaExceededException`

  