By this point, we have successfully created an image for training and prediction. However, there is a challenge: if we want to modify the code in the train
file, we would need to rebuild the image and push it to ECR. This process is extremely time-consuming, especially in real-world scenarios where images and models can be quite large.
To address this, we can implement a few modifications to separate the image from the execution file (specifically, the train
file).
You might recall that in requirements.txt, we included awscli
. The purpose of installing awscli
is to allow interaction with S3 resources that store the train
file or similar resources.
Thus, we can create a file named train.py with the same content as the original train
file, except for removing the first line #!/usr/bin/env python
. This file is then uploaded to the customized-sagemaker-image-decision-tree-bucket, inside the train-script/ folder.
Next, we update the Code in the Lambda function create-training-job-decision-tree-function, specifically in the create_training_job
section.
AlgorithmSpecification={
'TrainingImage': estimator['image_uri'],
'TrainingInputMode': 'File',
'ContainerEntrypoint': [
'/bin/bash',
'-c',
'aws s3 cp {}/{} /opt/ml/code/{} && python /opt/ml/code/{}'.format(
's3://customized-sagemaker-image-decision-tree-bucket/train-script',
'train.py',
'train.py',
'train.py'
)
]
}
Thus, when the training job starts, the command will override the train
execution, and train.py
will be executed instead.