Skip to content

DataQualityJobDefinition ​

The DataQualityJobDefinition resource lets you create and manage AWS SageMaker DataQualityJobDefinitions that define the inputs, outputs, and settings for running data quality monitoring jobs on your datasets.

Minimal Example ​

This example demonstrates how to create a basic DataQualityJobDefinition with required properties and some common optional settings.

ts
import AWS from "alchemy/aws/control";

const dataQualityJobDefinition = await AWS.SageMaker.DataQualityJobDefinition("basicDataQualityJob", {
  DataQualityJobInput: {
    S3Input: {
      S3Uri: "s3://my-bucket/data",
      LocalPath: "/data/input",
      ContentType: "application/json"
    }
  },
  DataQualityAppSpecification: {
    ImageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-data-quality-image:latest",
    ContainerEntrypoint: ["python3", "data_quality_script.py"],
    ContainerArguments: ["--input", "/data/input"]
  },
  JobResources: {
    ClusterConfig: {
      InstanceType: "ml.m5.large",
      InstanceCount: 1,
      VolumeSizeInGB: 20
    }
  },
  RoleArn: "arn:aws:iam::123456789012:role/my-sagemaker-role",
  DataQualityJobOutputConfig: {
    S3Output: {
      S3Uri: "s3://my-bucket/output",
      LocalPath: "/data/output"
    }
  },
  EndpointName: "my-endpoint"
});

Advanced Configuration ​

This example illustrates how to configure a DataQualityJobDefinition with additional advanced settings like a stopping condition and network configuration.

ts
const advancedDataQualityJobDefinition = await AWS.SageMaker.DataQualityJobDefinition("advancedDataQualityJob", {
  DataQualityJobInput: {
    S3Input: {
      S3Uri: "s3://my-bucket/data",
      LocalPath: "/data/input",
      ContentType: "application/json"
    }
  },
  DataQualityAppSpecification: {
    ImageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-data-quality-image:latest",
    ContainerEntrypoint: ["python3", "data_quality_script.py"],
    ContainerArguments: ["--input", "/data/input"]
  },
  JobResources: {
    ClusterConfig: {
      InstanceType: "ml.m5.large",
      InstanceCount: 1,
      VolumeSizeInGB: 20
    }
  },
  RoleArn: "arn:aws:iam::123456789012:role/my-sagemaker-role",
  DataQualityJobOutputConfig: {
    S3Output: {
      S3Uri: "s3://my-bucket/output",
      LocalPath: "/data/output"
    }
  },
  StoppingCondition: {
    MaxRuntimeInSeconds: 3600
  },
  NetworkConfig: {
    EnableNetworkIsolation: true,
    VpcConfig: {
      SecurityGroupIds: ["sg-0abcd1234efgh5678"],
      Subnets: ["subnet-0abcd1234efgh5678"]
    }
  }
});

Using Tags ​

This example demonstrates how to create a DataQualityJobDefinition with tags for better resource management and tracking.

ts
const taggedDataQualityJobDefinition = await AWS.SageMaker.DataQualityJobDefinition("taggedDataQualityJob", {
  DataQualityJobInput: {
    S3Input: {
      S3Uri: "s3://my-bucket/data",
      LocalPath: "/data/input",
      ContentType: "application/json"
    }
  },
  DataQualityAppSpecification: {
    ImageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-data-quality-image:latest",
    ContainerEntrypoint: ["python3", "data_quality_script.py"],
    ContainerArguments: ["--input", "/data/input"]
  },
  JobResources: {
    ClusterConfig: {
      InstanceType: "ml.m5.large",
      InstanceCount: 1,
      VolumeSizeInGB: 20
    }
  },
  RoleArn: "arn:aws:iam::123456789012:role/my-sagemaker-role",
  DataQualityJobOutputConfig: {
    S3Output: {
      S3Uri: "s3://my-bucket/output",
      LocalPath: "/data/output"
    }
  },
  Tags: [
    { Key: "Environment", Value: "Production" },
    { Key: "Team", Value: "DataScience" }
  ]
});