Making a Sagemaker DeepChem Development Environment

In an earlier post Deploying a DeepChem Sagemaker Notebook Instance, we introduced how to set up an AWS Sagemaker instance that runs DeepChem. (The installation instructions for today are slightly different since we’re now on DeepChem 2.5+ instead of DeepChem 2.3 but mostly carry over as-is). However, this setup has the major limitation that each time you shutdown/restart your sagemaker instance, you will need to re-install DeepChem. Ideally we’d prefer to have a stable cloud development environment which will continue to have a stable DeepChem installation after shutdown/restart.

AWS provides a tool for setting up Sagemaker environments called “Lifecycle Configurations.” To set up your lifecycle configuration, navigate to your AWS Sagemaker console and go to lifecycle configurations

Screen Shot 2021-07-11 at 4.59.04 PM

Next click create lifecycle configuration and go to the “Create Notebook” section in scripts as shown below.

Screen Shot 2021-07-11 at 5.02.36 PM

Paste the following script in:

#!/bin/bash

set -e

# OVERVIEW
# This script installs a single pip package in a single SageMaker conda environments.

sudo -u ec2-user -i <<'EOF'
# PARAMETERS
ENVIRONMENT=python3
source /home/ec2-user/anaconda3/bin/activate "$ENVIRONMENT"
pip install --pre deepchem[torch]
conda install -c conda-forge rdkit
source /home/ec2-user/anaconda3/bin/deactivate
EOF

Note this configuration installs deepchem[torch]. Modify as needed for your preferred backend. DeepChem will be installed in the built-in python3 conda environment according to this script. You can modify to install in your preferred conda environment as desired. Click “Create Configuration”. Next you will need to create a new Sagemaker notebook instance that uses this Lifecycle Configuration as shown in the image below (note I’ve named my lifecycle configuration “DeepChem-Torch”)

Screen Shot 2021-07-11 at 5.05.54 PM

Click “Create Notebook Instance” and wait for your instance to come up. I’ve named my notebook insntance DeepChem-K80 since I set it up to use a K80 GPU. Name your instance as desired. Once the instance is up, launch JupyterLab as shown below

Screen Shot 2021-07-11 at 5.08.37 PM

Launch the conda_python3 environment as shown below

Screen Shot 2021-07-11 at 5.10.17 PM

And now you should be all set to use DeepChem!

Screen Shot 2021-07-11 at 5.11.15 PM

Make sure to shut down your notebook instance after you’re done using it! AWS will keep charging you for instances that are left running overnight. It can be dangerously easy to rack up large AWS expenses. Happy hacking!

1 Like

Following discussion on Twitter, here are a few extra pointers:

  1. If you modify the lifecycle configuration script above, watch out for time-outs. AWS will time-out after 5 minutes so your install needs to happen within that time. DeepChem’s new pip installs are quite fast but other dependencies may slow down installs (H/T Angelica Parente: https://twitter.com/draparente/status/1414411797239459841)
  2. If you modify the lifecycle configuration that an existing notebook instance is already using, you may run into strange errors. I found it cleanest just to delete existing notebook instances and relaunch them if I had changes I wanted to make to the lifecycle configurations. Make sure to check the cloudwatch logs for the notebook instance if you’re running into strange errors