Bonus

By this point, we have successfully created an image for training and prediction. However, there is a challenge: if we want to modify the code in the train file, we would need to rebuild the image and push it to ECR. This process is extremely time-consuming, especially in real-world scenarios where images and models can be quite large.

To address this, we can implement a few modifications to separate the image from the execution file (specifically, the train file).

You might recall that in requirements.txt, we included awscli. The purpose of installing awscli is to allow interaction with S3 resources that store the train file or similar resources.

Thus, we can create a file named train.py with the same content as the original train file, except for removing the first line #!/usr/bin/env python. This file is then uploaded to the customized-sagemaker-image-decision-tree-bucket, inside the train-script/ folder.

search IAM

Next, we update the Code in the Lambda function create-training-job-decision-tree-function, specifically in the create_training_job section.

AlgorithmSpecification={
                'TrainingImage': estimator['image_uri'],
                'TrainingInputMode': 'File',
                'ContainerEntrypoint': [
                    '/bin/bash',
                    '-c',
                    'aws s3 cp {}/{} /opt/ml/code/{} && python /opt/ml/code/{}'.format(
                        's3://customized-sagemaker-image-decision-tree-bucket/train-script',
                        'train.py',
                        'train.py',
                        'train.py'
                        )
                ]
            }

Thus, when the training job starts, the command will override the train execution, and train.py will be executed instead.