:doc:`Athena <../../athena>` / Client / create_data_catalog

*******************
create_data_catalog
*******************



.. py:method:: Athena.Client.create_data_catalog(**kwargs)

  

  Creates (registers) a data catalog with the specified name and properties. Catalogs created are visible to all users of the same Amazon Web Services account.

   

  For a ``FEDERATED`` catalog, this API operation creates the following resources.

   

  
  * CFN Stack Name with a maximum length of 128 characters and prefix ``athenafederatedcatalog-CATALOG_NAME_SANITIZED`` with length 23 characters.
   
  * Lambda Function Name with a maximum length of 64 characters and prefix ``athenafederatedcatalog_CATALOG_NAME_SANITIZED`` with length 23 characters.
   
  * Glue Connection Name with a maximum length of 255 characters and a prefix ``athenafederatedcatalog_CATALOG_NAME_SANITIZED`` with length 23 characters.
  

  

  See also: `AWS API Documentation <https://docs.aws.amazon.com/goto/WebAPI/athena-2017-05-18/CreateDataCatalog>`_  


  **Request Syntax**
  ::

    response = client.create_data_catalog(
        Name='string',
        Type='LAMBDA'|'GLUE'|'HIVE'|'FEDERATED',
        Description='string',
        Parameters={
            'string': 'string'
        },
        Tags=[
            {
                'Key': 'string',
                'Value': 'string'
            },
        ]
    )
    
  :type Name: string
  :param Name: **[REQUIRED]** 

    The name of the data catalog to create. The catalog name must be unique for the Amazon Web Services account and can use a maximum of 127 alphanumeric, underscore, at sign, or hyphen characters. The remainder of the length constraint of 256 is reserved for use by Athena.

     

    For ``FEDERATED`` type the catalog name has following considerations and limits:

     

    
    * The catalog name allows special characters such as ``_ , @ , \ , - ``. These characters are replaced with a hyphen (-) when creating the CFN Stack Name and with an underscore (_) when creating the Lambda Function and Glue Connection Name.
     
    * The catalog name has a theoretical limit of 128 characters. However, since we use it to create other resources that allow less characters and we prepend a prefix to it, the actual catalog name limit for ``FEDERATED`` catalog is 64 - 23 = 41 characters.
    

    

  
  :type Type: string
  :param Type: **[REQUIRED]** 

    The type of data catalog to create: ``LAMBDA`` for a federated catalog, ``GLUE`` for an Glue Data Catalog, and ``HIVE`` for an external Apache Hive metastore. ``FEDERATED`` is a federated catalog for which Athena creates the connection and the Lambda function for you based on the parameters that you pass.

     

    For ``FEDERATED`` type, we do not support IAM identity center.

    

  
  :type Description: string
  :param Description: 

    A description of the data catalog to be created.

    

  
  :type Parameters: dict
  :param Parameters: 

    Specifies the Lambda function or functions to use for creating the data catalog. This is a mapping whose values depend on the catalog type.

     

    
    * For the ``HIVE`` data catalog type, use the following syntax. The ``metadata-function`` parameter is required. ``The sdk-version`` parameter is optional and defaults to the currently supported version. ``metadata-function=lambda_arn, sdk-version=version_number``
     
    * For the ``LAMBDA`` data catalog type, use one of the following sets of required parameters, but not both. 

      
      * If you have one Lambda function that processes metadata and another for reading the actual data, use the following syntax. Both parameters are required. ``metadata-function=lambda_arn, record-function=lambda_arn``
       
      * If you have a composite Lambda function that processes both metadata and data, use the following syntax to specify your Lambda function. ``function=lambda_arn``
      

    
     
    * The ``GLUE`` type takes a catalog ID parameter and is required. The ``catalog_id`` is the account ID of the Amazon Web Services account to which the Glue Data Catalog belongs. ``catalog-id=catalog_id`` 

      
      * The ``GLUE`` data catalog type also applies to the default ``AwsDataCatalog`` that already exists in your account, of which you can have only one and cannot modify.
      

    
     
    * The ``FEDERATED`` data catalog type uses one of the following parameters, but not both. Use ``connection-arn`` for an existing Glue connection. Use ``connection-type`` and ``connection-properties`` to specify the configuration setting for a new connection. 

      
      * ``connection-arn:<glue_connection_arn_to_reuse>``
       
      * ``lambda-role-arn`` (optional): The execution role to use for the Lambda function. If not provided, one is created.
       
      * ``connection-type:MYSQL|REDSHIFT|...., connection-properties:"<json_string>"`` For ``<json_string>`` , use escaped JSON text, as in the following example. ``"{\"spill_bucket\":\"my_spill\",\"spill_prefix\":\"athena-spill\",\"host\":\"abc12345.snowflakecomputing.com\",\"port\":\"1234\",\"warehouse\":\"DEV_WH\",\"database\":\"TEST\",\"schema\":\"PUBLIC\",\"SecretArn\":\"arn:aws:secretsmanager:ap-south-1:111122223333:secret:snowflake-XHb67j\"}"``
      

    
    

    

  
    - *(string) --* 

    
      - *(string) --* 

      


  :type Tags: list
  :param Tags: 

    A list of comma separated tags to add to the data catalog that is created. All the resources that are created by the ``CreateDataCatalog`` API operation with ``FEDERATED`` type will have the tag ``federated_athena_datacatalog="true"``. This includes the CFN Stack, Glue Connection, Athena DataCatalog, and all the resources created as part of the CFN Stack (Lambda Function, IAM policies/roles).

    

  
    - *(dict) --* 

      A label that you assign to a resource. Athena resources include workgroups, data catalogs, and capacity reservations. Each tag consists of a key and an optional value, both of which you define. For example, you can use tags to categorize Athena resources by purpose, owner, or environment. Use a consistent set of tag keys to make it easier to search and filter the resources in your account. For best practices, see `Tagging Best Practices <https://docs.aws.amazon.com/whitepapers/latest/tagging-best-practices/tagging-best-practices.html>`__. Tag keys can be from 1 to 128 UTF-8 Unicode characters, and tag values can be from 0 to 256 UTF-8 Unicode characters. Tags can use letters and numbers representable in UTF-8, and the following characters: + - = . _ : / @. Tag keys and values are case-sensitive. Tag keys must be unique per resource. If you specify more than one tag, separate them by commas.

      

    
      - **Key** *(string) --* 

        A tag key. The tag key length is from 1 to 128 Unicode characters in UTF-8. You can use letters and numbers representable in UTF-8, and the following characters: + - = . _ : / @. Tag keys are case-sensitive and must be unique per resource.

        

      
      - **Value** *(string) --* 

        A tag value. The tag value length is from 0 to 256 Unicode characters in UTF-8. You can use letters and numbers representable in UTF-8, and the following characters: + - = . _ : / @. Tag values are case-sensitive.

        

      
    

  
  :rtype: dict
  :returns: 
    
    **Response Syntax**

    
    ::

      {
          'DataCatalog': {
              'Name': 'string',
              'Description': 'string',
              'Type': 'LAMBDA'|'GLUE'|'HIVE'|'FEDERATED',
              'Parameters': {
                  'string': 'string'
              },
              'Status': 'CREATE_IN_PROGRESS'|'CREATE_COMPLETE'|'CREATE_FAILED'|'CREATE_FAILED_CLEANUP_IN_PROGRESS'|'CREATE_FAILED_CLEANUP_COMPLETE'|'CREATE_FAILED_CLEANUP_FAILED'|'DELETE_IN_PROGRESS'|'DELETE_COMPLETE'|'DELETE_FAILED',
              'ConnectionType': 'DYNAMODB'|'MYSQL'|'POSTGRESQL'|'REDSHIFT'|'ORACLE'|'SYNAPSE'|'SQLSERVER'|'DB2'|'OPENSEARCH'|'BIGQUERY'|'GOOGLECLOUDSTORAGE'|'HBASE'|'DOCUMENTDB'|'CMDB'|'TPCDS'|'TIMESTREAM'|'SAPHANA'|'SNOWFLAKE'|'DATALAKEGEN2'|'DB2AS400',
              'Error': 'string'
          }
      }
      
    **Response Structure**

    

    - *(dict) --* 
      

      - **DataCatalog** *(dict) --* 

        Contains information about a data catalog in an Amazon Web Services account.

         

        .. note::

          

          In the Athena console, data catalogs are listed as "data sources" on the **Data sources** page under the **Data source name** column.

          

        
        

        - **Name** *(string) --* 

          The name of the data catalog. The catalog name must be unique for the Amazon Web Services account and can use a maximum of 127 alphanumeric, underscore, at sign, or hyphen characters. The remainder of the length constraint of 256 is reserved for use by Athena.

          
        

        - **Description** *(string) --* 

          An optional description of the data catalog.

          
        

        - **Type** *(string) --* 

          The type of data catalog to create: ``LAMBDA`` for a federated catalog, ``GLUE`` for an Glue Data Catalog, and ``HIVE`` for an external Apache Hive metastore. ``FEDERATED`` is a federated catalog for which Athena creates the connection and the Lambda function for you based on the parameters that you pass.

          
        

        - **Parameters** *(dict) --* 

          Specifies the Lambda function or functions to use for the data catalog. This is a mapping whose values depend on the catalog type.

           

          
          * For the ``HIVE`` data catalog type, use the following syntax. The ``metadata-function`` parameter is required. ``The sdk-version`` parameter is optional and defaults to the currently supported version. ``metadata-function=lambda_arn, sdk-version=version_number``
           
          * For the ``LAMBDA`` data catalog type, use one of the following sets of required parameters, but not both. 

            
            * If you have one Lambda function that processes metadata and another for reading the actual data, use the following syntax. Both parameters are required. ``metadata-function=lambda_arn, record-function=lambda_arn``
             
            * If you have a composite Lambda function that processes both metadata and data, use the following syntax to specify your Lambda function. ``function=lambda_arn``
            

          
           
          * The ``GLUE`` type takes a catalog ID parameter and is required. The ``catalog_id`` is the account ID of the Amazon Web Services account to which the Glue catalog belongs. ``catalog-id=catalog_id`` 

            
            * The ``GLUE`` data catalog type also applies to the default ``AwsDataCatalog`` that already exists in your account, of which you can have only one and cannot modify.
            

          
           
          * The ``FEDERATED`` data catalog type uses one of the following parameters, but not both. Use ``connection-arn`` for an existing Glue connection. Use ``connection-type`` and ``connection-properties`` to specify the configuration setting for a new connection. 

            
            * ``connection-arn:<glue_connection_arn_to_reuse>``
             
            * ``connection-type:MYSQL|REDSHIFT|...., connection-properties:"<json_string>"`` For ``<json_string>`` , use escaped JSON text, as in the following example. ``"{\"spill_bucket\":\"my_spill\",\"spill_prefix\":\"athena-spill\",\"host\":\"abc12345.snowflakecomputing.com\",\"port\":\"1234\",\"warehouse\":\"DEV_WH\",\"database\":\"TEST\",\"schema\":\"PUBLIC\",\"SecretArn\":\"arn:aws:secretsmanager:ap-south-1:111122223333:secret:snowflake-XHb67j\"}"``
            

          
          

          
          

          - *(string) --* 
            

            - *(string) --* 
      
    
        

        - **Status** *(string) --* 

          The status of the creation or deletion of the data catalog.

           

          
          * The ``LAMBDA``, ``GLUE``, and ``HIVE`` data catalog types are created synchronously. Their status is either ``CREATE_COMPLETE`` or ``CREATE_FAILED``.
           
          * The ``FEDERATED`` data catalog type is created asynchronously.
          

           

          Data catalog creation status:

           

          
          * ``CREATE_IN_PROGRESS``: Federated data catalog creation in progress.
           
          * ``CREATE_COMPLETE``: Data catalog creation complete.
           
          * ``CREATE_FAILED``: Data catalog could not be created.
           
          * ``CREATE_FAILED_CLEANUP_IN_PROGRESS``: Federated data catalog creation failed and is being removed.
           
          * ``CREATE_FAILED_CLEANUP_COMPLETE``: Federated data catalog creation failed and was removed.
           
          * ``CREATE_FAILED_CLEANUP_FAILED``: Federated data catalog creation failed but could not be removed.
          

           

          Data catalog deletion status:

           

          
          * ``DELETE_IN_PROGRESS``: Federated data catalog deletion in progress.
           
          * ``DELETE_COMPLETE``: Federated data catalog deleted.
           
          * ``DELETE_FAILED``: Federated data catalog could not be deleted.
          

          
        

        - **ConnectionType** *(string) --* 

          The type of connection for a ``FEDERATED`` data catalog (for example, ``REDSHIFT``, ``MYSQL``, or ``SQLSERVER``). For information about individual connectors, see `Available data source connectors <https://docs.aws.amazon.com/athena/latest/ug/connectors-available.html>`__.

          
        

        - **Error** *(string) --* 

          Text of the error that occurred during data catalog creation or deletion.

          
    
  
  **Exceptions**
  
  *   :py:class:`Athena.Client.exceptions.InternalServerException`

  
  *   :py:class:`Athena.Client.exceptions.InvalidRequestException`

  