Skip to content
GitHubXDiscord

DataQualityJobDefinition

The DataQualityJobDefinition resource lets you create and manage AWS SageMaker DataQualityJobDefinitions that define the inputs, outputs, and settings for running data quality monitoring jobs on your datasets.

This example demonstrates how to create a basic DataQualityJobDefinition with required properties and some common optional settings.

import AWS from "alchemy/aws/control";
const dataQualityJobDefinition = await AWS.SageMaker.DataQualityJobDefinition("basicDataQualityJob", {
DataQualityJobInput: {
S3Input: {
S3Uri: "s3://my-bucket/data",
LocalPath: "/data/input",
ContentType: "application/json"
}
},
DataQualityAppSpecification: {
ImageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-data-quality-image:latest",
ContainerEntrypoint: ["python3", "data_quality_script.py"],
ContainerArguments: ["--input", "/data/input"]
},
JobResources: {
ClusterConfig: {
InstanceType: "ml.m5.large",
InstanceCount: 1,
VolumeSizeInGB: 20
}
},
RoleArn: "arn:aws:iam::123456789012:role/my-sagemaker-role",
DataQualityJobOutputConfig: {
S3Output: {
S3Uri: "s3://my-bucket/output",
LocalPath: "/data/output"
}
},
EndpointName: "my-endpoint"
});

This example illustrates how to configure a DataQualityJobDefinition with additional advanced settings like a stopping condition and network configuration.

const advancedDataQualityJobDefinition = await AWS.SageMaker.DataQualityJobDefinition("advancedDataQualityJob", {
DataQualityJobInput: {
S3Input: {
S3Uri: "s3://my-bucket/data",
LocalPath: "/data/input",
ContentType: "application/json"
}
},
DataQualityAppSpecification: {
ImageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-data-quality-image:latest",
ContainerEntrypoint: ["python3", "data_quality_script.py"],
ContainerArguments: ["--input", "/data/input"]
},
JobResources: {
ClusterConfig: {
InstanceType: "ml.m5.large",
InstanceCount: 1,
VolumeSizeInGB: 20
}
},
RoleArn: "arn:aws:iam::123456789012:role/my-sagemaker-role",
DataQualityJobOutputConfig: {
S3Output: {
S3Uri: "s3://my-bucket/output",
LocalPath: "/data/output"
}
},
StoppingCondition: {
MaxRuntimeInSeconds: 3600
},
NetworkConfig: {
EnableNetworkIsolation: true,
VpcConfig: {
SecurityGroupIds: ["sg-0abcd1234efgh5678"],
Subnets: ["subnet-0abcd1234efgh5678"]
}
}
});

This example demonstrates how to create a DataQualityJobDefinition with tags for better resource management and tracking.

const taggedDataQualityJobDefinition = await AWS.SageMaker.DataQualityJobDefinition("taggedDataQualityJob", {
DataQualityJobInput: {
S3Input: {
S3Uri: "s3://my-bucket/data",
LocalPath: "/data/input",
ContentType: "application/json"
}
},
DataQualityAppSpecification: {
ImageUri: "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-data-quality-image:latest",
ContainerEntrypoint: ["python3", "data_quality_script.py"],
ContainerArguments: ["--input", "/data/input"]
},
JobResources: {
ClusterConfig: {
InstanceType: "ml.m5.large",
InstanceCount: 1,
VolumeSizeInGB: 20
}
},
RoleArn: "arn:aws:iam::123456789012:role/my-sagemaker-role",
DataQualityJobOutputConfig: {
S3Output: {
S3Uri: "s3://my-bucket/output",
LocalPath: "/data/output"
}
},
Tags: [
{ Key: "Environment", Value: "Production" },
{ Key: "Team", Value: "DataScience" }
]
});