:doc:`NeptuneData <../../neptunedata>` / Client / start_ml_data_processing_job

****************************
start_ml_data_processing_job
****************************



.. py:method:: NeptuneData.Client.start_ml_data_processing_job(**kwargs)

  

  Creates a new Neptune ML data processing job for processing the graph data exported from Neptune for training. See `The dataprocessing command <https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning-api-dataprocessing.html>`__.

   

  When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the `neptune-db\:StartMLModelDataProcessingJob <https://docs.aws.amazon.com/neptune/latest/userguide/iam-dp-actions.html#startmlmodeldataprocessingjob>`__ IAM action in that cluster.

  

  See also: `AWS API Documentation <https://docs.aws.amazon.com/goto/WebAPI/neptunedata-2023-08-01/StartMLDataProcessingJob>`_  


  **Request Syntax**
  ::

    response = client.start_ml_data_processing_job(
        id='string',
        previousDataProcessingJobId='string',
        inputDataS3Location='string',
        processedDataS3Location='string',
        sagemakerIamRoleArn='string',
        neptuneIamRoleArn='string',
        processingInstanceType='string',
        processingInstanceVolumeSizeInGB=123,
        processingTimeOutInSeconds=123,
        modelType='string',
        configFileName='string',
        subnets=[
            'string',
        ],
        securityGroupIds=[
            'string',
        ],
        volumeEncryptionKMSKey='string',
        s3OutputEncryptionKMSKey='string'
    )
    
  :type id: string
  :param id: 

    A unique identifier for the new job. The default is an autogenerated UUID.

    

  
  :type previousDataProcessingJobId: string
  :param previousDataProcessingJobId: 

    The job ID of a completed data processing job run on an earlier version of the data.

    

  
  :type inputDataS3Location: string
  :param inputDataS3Location: **[REQUIRED]** 

    The URI of the Amazon S3 location where you want SageMaker to download the data needed to run the data processing job.

    

  
  :type processedDataS3Location: string
  :param processedDataS3Location: **[REQUIRED]** 

    The URI of the Amazon S3 location where you want SageMaker to save the results of a data processing job.

    

  
  :type sagemakerIamRoleArn: string
  :param sagemakerIamRoleArn: 

    The ARN of an IAM role for SageMaker execution. This must be listed in your DB cluster parameter group or an error will occur.

    

  
  :type neptuneIamRoleArn: string
  :param neptuneIamRoleArn: 

    The Amazon Resource Name (ARN) of an IAM role that SageMaker can assume to perform tasks on your behalf. This must be listed in your DB cluster parameter group or an error will occur.

    

  
  :type processingInstanceType: string
  :param processingInstanceType: 

    The type of ML instance used during data processing. Its memory should be large enough to hold the processed dataset. The default is the smallest ml.r5 type whose memory is ten times larger than the size of the exported graph data on disk.

    

  
  :type processingInstanceVolumeSizeInGB: integer
  :param processingInstanceVolumeSizeInGB: 

    The disk volume size of the processing instance. Both input data and processed data are stored on disk, so the volume size must be large enough to hold both data sets. The default is 0. If not specified or 0, Neptune ML chooses the volume size automatically based on the data size.

    

  
  :type processingTimeOutInSeconds: integer
  :param processingTimeOutInSeconds: 

    Timeout in seconds for the data processing job. The default is 86,400 (1 day).

    

  
  :type modelType: string
  :param modelType: 

    One of the two model types that Neptune ML currently supports: heterogeneous graph models ( ``heterogeneous``), and knowledge graph ( ``kge``). The default is none. If not specified, Neptune ML chooses the model type automatically based on the data.

    

  
  :type configFileName: string
  :param configFileName: 

    A data specification file that describes how to load the exported graph data for training. The file is automatically generated by the Neptune export toolkit. The default is ``training-data-configuration.json``.

    

  
  :type subnets: list
  :param subnets: 

    The IDs of the subnets in the Neptune VPC. The default is None.

    

  
    - *(string) --* 

    

  :type securityGroupIds: list
  :param securityGroupIds: 

    The VPC security group IDs. The default is None.

    

  
    - *(string) --* 

    

  :type volumeEncryptionKMSKey: string
  :param volumeEncryptionKMSKey: 

    The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.

    

  
  :type s3OutputEncryptionKMSKey: string
  :param s3OutputEncryptionKMSKey: 

    The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.

    

  
  
  :rtype: dict
  :returns: 
    
    **Response Syntax**

    
    ::

      {
          'id': 'string',
          'arn': 'string',
          'creationTimeInMillis': 123
      }
      
    **Response Structure**

    

    - *(dict) --* 
      

      - **id** *(string) --* 

        The unique ID of the new data processing job.

        
      

      - **arn** *(string) --* 

        The ARN of the data processing job.

        
      

      - **creationTimeInMillis** *(integer) --* 

        The time it took to create the new processing job, in milliseconds.

        
  
  **Exceptions**
  
  *   :py:class:`NeptuneData.Client.exceptions.UnsupportedOperationException`

  
  *   :py:class:`NeptuneData.Client.exceptions.BadRequestException`

  
  *   :py:class:`NeptuneData.Client.exceptions.InvalidParameterException`

  
  *   :py:class:`NeptuneData.Client.exceptions.MLResourceNotFoundException`

  
  *   :py:class:`NeptuneData.Client.exceptions.ClientTimeoutException`

  
  *   :py:class:`NeptuneData.Client.exceptions.PreconditionsFailedException`

  
  *   :py:class:`NeptuneData.Client.exceptions.ConstraintViolationException`

  
  *   :py:class:`NeptuneData.Client.exceptions.InvalidArgumentException`

  
  *   :py:class:`NeptuneData.Client.exceptions.MissingParameterException`

  
  *   :py:class:`NeptuneData.Client.exceptions.IllegalArgumentException`

  
  *   :py:class:`NeptuneData.Client.exceptions.TooManyRequestsException`

  