:doc:`GlueDataBrew <../../databrew>` / Client / describe_dataset

****************
describe_dataset
****************



.. py:method:: GlueDataBrew.Client.describe_dataset(**kwargs)

  

  Returns the definition of a specific DataBrew dataset.

  

  See also: `AWS API Documentation <https://docs.aws.amazon.com/goto/WebAPI/databrew-2017-07-25/DescribeDataset>`_  


  **Request Syntax**
  ::

    response = client.describe_dataset(
        Name='string'
    )
    
  :type Name: string
  :param Name: **[REQUIRED]** 

    The name of the dataset to be described.

    

  
  
  :rtype: dict
  :returns: 
    
    **Response Syntax**

    
    ::

      {
          'CreatedBy': 'string',
          'CreateDate': datetime(2015, 1, 1),
          'Name': 'string',
          'Format': 'CSV'|'JSON'|'PARQUET'|'EXCEL'|'ORC',
          'FormatOptions': {
              'Json': {
                  'MultiLine': True|False
              },
              'Excel': {
                  'SheetNames': [
                      'string',
                  ],
                  'SheetIndexes': [
                      123,
                  ],
                  'HeaderRow': True|False
              },
              'Csv': {
                  'Delimiter': 'string',
                  'HeaderRow': True|False
              }
          },
          'Input': {
              'S3InputDefinition': {
                  'Bucket': 'string',
                  'Key': 'string',
                  'BucketOwner': 'string'
              },
              'DataCatalogInputDefinition': {
                  'CatalogId': 'string',
                  'DatabaseName': 'string',
                  'TableName': 'string',
                  'TempDirectory': {
                      'Bucket': 'string',
                      'Key': 'string',
                      'BucketOwner': 'string'
                  }
              },
              'DatabaseInputDefinition': {
                  'GlueConnectionName': 'string',
                  'DatabaseTableName': 'string',
                  'TempDirectory': {
                      'Bucket': 'string',
                      'Key': 'string',
                      'BucketOwner': 'string'
                  },
                  'QueryString': 'string'
              },
              'Metadata': {
                  'SourceArn': 'string'
              }
          },
          'LastModifiedDate': datetime(2015, 1, 1),
          'LastModifiedBy': 'string',
          'Source': 'S3'|'DATA-CATALOG'|'DATABASE',
          'PathOptions': {
              'LastModifiedDateCondition': {
                  'Expression': 'string',
                  'ValuesMap': {
                      'string': 'string'
                  }
              },
              'FilesLimit': {
                  'MaxFiles': 123,
                  'OrderedBy': 'LAST_MODIFIED_DATE',
                  'Order': 'DESCENDING'|'ASCENDING'
              },
              'Parameters': {
                  'string': {
                      'Name': 'string',
                      'Type': 'Datetime'|'Number'|'String',
                      'DatetimeOptions': {
                          'Format': 'string',
                          'TimezoneOffset': 'string',
                          'LocaleCode': 'string'
                      },
                      'CreateColumn': True|False,
                      'Filter': {
                          'Expression': 'string',
                          'ValuesMap': {
                              'string': 'string'
                          }
                      }
                  }
              }
          },
          'Tags': {
              'string': 'string'
          },
          'ResourceArn': 'string'
      }
      
    **Response Structure**

    

    - *(dict) --* 
      

      - **CreatedBy** *(string) --* 

        The identifier (user name) of the user who created the dataset.

        
      

      - **CreateDate** *(datetime) --* 

        The date and time that the dataset was created.

        
      

      - **Name** *(string) --* 

        The name of the dataset.

        
      

      - **Format** *(string) --* 

        The file format of a dataset that is created from an Amazon S3 file or folder.

        
      

      - **FormatOptions** *(dict) --* 

        Represents a set of options that define the structure of either comma-separated value (CSV), Excel, or JSON input.

        
        

        - **Json** *(dict) --* 

          Options that define how JSON input is to be interpreted by DataBrew.

          
          

          - **MultiLine** *(boolean) --* 

            A value that specifies whether JSON input contains embedded new line characters.

            
      
        

        - **Excel** *(dict) --* 

          Options that define how Excel input is to be interpreted by DataBrew.

          
          

          - **SheetNames** *(list) --* 

            One or more named sheets in the Excel file that will be included in the dataset.

            
            

            - *(string) --* 
        
          

          - **SheetIndexes** *(list) --* 

            One or more sheet numbers in the Excel file that will be included in the dataset.

            
            

            - *(integer) --* 
        
          

          - **HeaderRow** *(boolean) --* 

            A variable that specifies whether the first row in the file is parsed as the header. If this value is false, column names are auto-generated.

            
      
        

        - **Csv** *(dict) --* 

          Options that define how CSV input is to be interpreted by DataBrew.

          
          

          - **Delimiter** *(string) --* 

            A single character that specifies the delimiter being used in the CSV file.

            
          

          - **HeaderRow** *(boolean) --* 

            A variable that specifies whether the first row in the file is parsed as the header. If this value is false, column names are auto-generated.

            
      
    
      

      - **Input** *(dict) --* 

        Represents information on how DataBrew can find data, in either the Glue Data Catalog or Amazon S3.

        
        

        - **S3InputDefinition** *(dict) --* 

          The Amazon S3 location where the data is stored.

          
          

          - **Bucket** *(string) --* 

            The Amazon S3 bucket name.

            
          

          - **Key** *(string) --* 

            The unique name of the object in the bucket.

            
          

          - **BucketOwner** *(string) --* 

            The Amazon Web Services account ID of the bucket owner.

            
      
        

        - **DataCatalogInputDefinition** *(dict) --* 

          The Glue Data Catalog parameters for the data.

          
          

          - **CatalogId** *(string) --* 

            The unique identifier of the Amazon Web Services account that holds the Data Catalog that stores the data.

            
          

          - **DatabaseName** *(string) --* 

            The name of a database in the Data Catalog.

            
          

          - **TableName** *(string) --* 

            The name of a database table in the Data Catalog. This table corresponds to a DataBrew dataset.

            
          

          - **TempDirectory** *(dict) --* 

            Represents an Amazon location where DataBrew can store intermediate results.

            
            

            - **Bucket** *(string) --* 

              The Amazon S3 bucket name.

              
            

            - **Key** *(string) --* 

              The unique name of the object in the bucket.

              
            

            - **BucketOwner** *(string) --* 

              The Amazon Web Services account ID of the bucket owner.

              
        
      
        

        - **DatabaseInputDefinition** *(dict) --* 

          Connection information for dataset input files stored in a database.

          
          

          - **GlueConnectionName** *(string) --* 

            The Glue Connection that stores the connection information for the target database.

            
          

          - **DatabaseTableName** *(string) --* 

            The table within the target database.

            
          

          - **TempDirectory** *(dict) --* 

            Represents an Amazon S3 location (bucket name, bucket owner, and object key) where DataBrew can read input data, or write output from a job.

            
            

            - **Bucket** *(string) --* 

              The Amazon S3 bucket name.

              
            

            - **Key** *(string) --* 

              The unique name of the object in the bucket.

              
            

            - **BucketOwner** *(string) --* 

              The Amazon Web Services account ID of the bucket owner.

              
        
          

          - **QueryString** *(string) --* 

            Custom SQL to run against the provided Glue connection. This SQL will be used as the input for DataBrew projects and jobs.

            
      
        

        - **Metadata** *(dict) --* 

          Contains additional resource information needed for specific datasets.

          
          

          - **SourceArn** *(string) --* 

            The Amazon Resource Name (ARN) associated with the dataset. Currently, DataBrew only supports ARNs from Amazon AppFlow.

            
      
    
      

      - **LastModifiedDate** *(datetime) --* 

        The date and time that the dataset was last modified.

        
      

      - **LastModifiedBy** *(string) --* 

        The identifier (user name) of the user who last modified the dataset.

        
      

      - **Source** *(string) --* 

        The location of the data for this dataset, Amazon S3 or the Glue Data Catalog.

        
      

      - **PathOptions** *(dict) --* 

        A set of options that defines how DataBrew interprets an Amazon S3 path of the dataset.

        
        

        - **LastModifiedDateCondition** *(dict) --* 

          If provided, this structure defines a date range for matching Amazon S3 objects based on their LastModifiedDate attribute in Amazon S3.

          
          

          - **Expression** *(string) --* 

            The expression which includes condition names followed by substitution variables, possibly grouped and combined with other conditions. For example, "(starts_with :prefix1 or starts_with :prefix2) and (ends_with :suffix1 or ends_with :suffix2)". Substitution variables should start with ':' symbol.

            
          

          - **ValuesMap** *(dict) --* 

            The map of substitution variable names to their values used in this filter expression.

            
            

            - *(string) --* 
              

              - *(string) --* 
        
      
      
        

        - **FilesLimit** *(dict) --* 

          If provided, this structure imposes a limit on a number of files that should be selected.

          
          

          - **MaxFiles** *(integer) --* 

            The number of Amazon S3 files to select.

            
          

          - **OrderedBy** *(string) --* 

            A criteria to use for Amazon S3 files sorting before their selection. By default uses LAST_MODIFIED_DATE as a sorting criteria. Currently it's the only allowed value.

            
          

          - **Order** *(string) --* 

            A criteria to use for Amazon S3 files sorting before their selection. By default uses DESCENDING order, i.e. most recent files are selected first. Another possible value is ASCENDING.

            
      
        

        - **Parameters** *(dict) --* 

          A structure that maps names of parameters used in the Amazon S3 path of a dataset to their definitions.

          
          

          - *(string) --* 
            

            - *(dict) --* 

              Represents a dataset parameter that defines type and conditions for a parameter in the Amazon S3 path of the dataset.

              
              

              - **Name** *(string) --* 

                The name of the parameter that is used in the dataset's Amazon S3 path.

                
              

              - **Type** *(string) --* 

                The type of the dataset parameter, can be one of a 'String', 'Number' or 'Datetime'.

                
              

              - **DatetimeOptions** *(dict) --* 

                Additional parameter options such as a format and a timezone. Required for datetime parameters.

                
                

                - **Format** *(string) --* 

                  Required option, that defines the datetime format used for a date parameter in the Amazon S3 path. Should use only supported datetime specifiers and separation characters, all literal a-z or A-Z characters should be escaped with single quotes. E.g. "MM.dd.yyyy-'at'-HH:mm".

                  
                

                - **TimezoneOffset** *(string) --* 

                  Optional value for a timezone offset of the datetime parameter value in the Amazon S3 path. Shouldn't be used if Format for this parameter includes timezone fields. If no offset specified, UTC is assumed.

                  
                

                - **LocaleCode** *(string) --* 

                  Optional value for a non-US locale code, needed for correct interpretation of some date formats.

                  
            
              

              - **CreateColumn** *(boolean) --* 

                Optional boolean value that defines whether the captured value of this parameter should be used to create a new column in a dataset.

                
              

              - **Filter** *(dict) --* 

                The optional filter expression structure to apply additional matching criteria to the parameter.

                
                

                - **Expression** *(string) --* 

                  The expression which includes condition names followed by substitution variables, possibly grouped and combined with other conditions. For example, "(starts_with :prefix1 or starts_with :prefix2) and (ends_with :suffix1 or ends_with :suffix2)". Substitution variables should start with ':' symbol.

                  
                

                - **ValuesMap** *(dict) --* 

                  The map of substitution variable names to their values used in this filter expression.

                  
                  

                  - *(string) --* 
                    

                    - *(string) --* 
              
            
            
          
      
    
    
      

      - **Tags** *(dict) --* 

        Metadata tags associated with this dataset.

        
        

        - *(string) --* 
          

          - *(string) --* 
    
  
      

      - **ResourceArn** *(string) --* 

        The Amazon Resource Name (ARN) of the dataset.

        
  
  **Exceptions**
  
  *   :py:class:`GlueDataBrew.Client.exceptions.ResourceNotFoundException`

  
  *   :py:class:`GlueDataBrew.Client.exceptions.ValidationException`

  