DeepChem 2.3: Keras modeling and groundwork for TensorFlow 2

peastman · October 23, 2019, 7:06pm

DeepChem 2.3 includes many architectural changes. Some are mostly invisible, while others will directly affect how you work with it. All of them have the same goal: to make sure DeepChem can continue to function well as part of the larger TensorFlow ecosystem.

DeepChem has already been through its share of architectural changes. After all, it has been around longer than TensorFlow. The earliest versions relied on Theano instead. Once TensorFlow was released we migrated the DeepChem models to it, but they were written “from scratch” in raw TensorFlow code. We realized this was not a sustainable way of implementing them and that we needed a higher level modeling API. Unfortunately, TensorFlow had no standard modeling API. After surveying various options, we decided to create TensorGraph to fill this need.

TensorGraph has served us well for the last few years, but the world keeps moving. Google has decided to standardize on Keras as the primary modeling API for TensorFlow. If DeepChem is to interoperate well with the rest of the ecosystem, it needs to adopt that standard. That is what we have done in DeepChem 2.3.

Migrating from TensorGraph to Keras

Depending on how you use DeepChem, this might hardly affect you at all. Consider the following code that creates a standard model, trains it on a training set, and evaluates it on a test set.

model = MultitaskClassifier(n_tasks, n_features)
model.fit(training_set)
print(model.evaluate(test_set, [metric])

This code works the same way in DeepChem 2.2 and 2.3. As long as you are just using standard model types, your code should continue to work and require few or no changes.

If you are building custom TensorGraph models out of layers, your code should also continue to work. TensorGraph hasn’t gone away, and it still works the same way it always has. On the other hand, it is deprecated. It probably will go away in a future release, so you should start thinking about converting your TensorGraph models over to Keras. And for any new models you write, you definitely should use Keras instead.

The TensorGraph package consists of three parts:

The deepchem.models.TensorGraph class.
A large collection of layers from which to build models, mostly found in the deepchem.models.tensorgraph.layers module.
All the standard model classes like MultitaskRegressor, GraphConvModel, etc.

Let’s consider how each of these has been changed or replaced.

TensorGraph Layers and Keras Layers

The first step was to convert the TensorGraph layers (subclasses of deepchem.models.tensorgraph.layers.Layer) to Keras layers (subclasses of tensorflow.keras.layers.Layer). But not all of them actually needed to be converted. Some are unique calculations provided by DeepChem, but a lot are just standard neural network calculations. There’s no need for DeepChem to provide its own Dense or Conv2D layers. There already are perfectly good ones that come as part of TensorFlow. So we identified just the important, non-trivial layers and converted them into Keras layers. Those are found in the deepchem.models.layers module. The old TensorGraph layers are still present in deepchem.models.tensorgraph.layers just as they were before, but now they are just thin wrappers around the corresponding Keras layers.

`tensorflow.keras.Model` and `deepchem.models.KerasModel`

The next step was to create a class called KerasModel that acts as a wrapper around a Keras model in the same way that SklearnModel acts as a wrapper around a scikit-learn model. It provides an API for training and inference that is as close as possible to the one in TensorGraph.

You might reasonably ask why we need KerasModel at all. After all, tf.keras.Model already provides an API for training and inference. Why not just use that? In fact that was our original plan, but we quickly discovered the Keras API had serious limitations that made it unsuitable for our needs. The KerasModel class allows us to overcome those limitations while still allowing arbitrary Keras based models to be used with DeepChem. Of course, if you have an application for which those limitations aren’t a problem you are free to use the Keras API directly. Because DeepChem’s calculations are now exposed as Keras layers, they can be directly used in any Keras based code.

The final step was to convert standard model classes like MultitaskRegressor to be subclasses of KerasModel instead of TensorGraph. In most cases this involved only minimal changes to their public APIs. The main exceptions are classes that expect you to define parts of the calculation, like GAN or the reinforcement learning classes. Those now expect the calculation to be defined with Keras instead of TensorGraph.

Example of Converting a Model

Let’s work through a detailed example of how to convert a model. Let’s use the one from the MNIST example notebook. Here is the TensorGraph version.

from deepchem.models.tensorgraph.layers import Feature, Reshape, Conv2D, Flatten, Dense, Label 
from deepchem.models.tensorgraph.layers import SoftMaxCrossEntropy, ReduceMean, SoftMax

model = dc.models.TensorGraph()
feature = Feature(shape=(None, 784))
make_image = Reshape(shape=(-1, 28, 28, 1), in_layers=[feature])
conv2d_1 = Conv2D(num_outputs=32, in_layers=[make_image])
conv2d_2 = Conv2D(num_outputs=64, in_layers=[conv2d_1])
flatten = Flatten(in_layers=[conv2d_2])
dense1 = Dense(out_channels=1024, activation_fn=tf.nn.relu, in_layers=[flatten])
dense2 = Dense(out_channels=10, in_layers=[dense1])

label = Label(shape=(None, 10))
smce = SoftMaxCrossEntropy(in_layers=[label, dense2])
loss = ReduceMean(in_layers=[smce])
model.set_loss(loss)

output = SoftMax(in_layers=[dense2])
model.add_output(output)

Now let’s look at a Keras version. The initial part is very similar. We use Keras layers instead of TensorGraph layers, and the syntax for specifying their inputs is a little different, but otherwise not much has changed.

from tensorflow.keras.layers import Input, Reshape, Conv2D, Flatten, Dense, Softmax

feature = Input(shape=(784,))
make_image = Reshape((28, 28, 1))(feature)
conv2d_1 = Conv2D(filters=32, kernel_size=5, activation=tf.nn.relu)(make_image)
conv2d_2 = Conv2D(filters=64, kernel_size=5, activation=tf.nn.relu)(conv2d_1)
flatten = Flatten()(conv2d_2)
dense1 = Dense(1024, activation=tf.nn.relu)(flatten)
dense2 = Dense(10)(dense1)
output = Softmax()(dense2)
keras_model = tf.keras.Model(inputs=[feature], outputs=[output])
model = dc.models.KerasModel(keras_model, dc.models.losses.CategoricalCrossEntropy())

Do notice one subtle difference. In the Keras Input and Reshape layers, the first dimension which iterates over samples is implicitly assumed, while in the TensorGraph version it is explicitly listed. This can be a source of confusion when converting models.

The most important difference between the two versions is in how the loss function is defined. With TensorGraph, you build up the loss function out of layers then call set_loss() to tell it what loss to use. With KerasModel, you instead provide a Python function that takes outputs, labels, and weights and returns the loss. You can implement any loss function you want, but DeepChem provides lots of standard ones built it. In this case, we specify CategoricalCrossEntropy.

There is one respect in which this implementation is not ideal. Both versions output probabilities computed with the softmax function, but notice how the TensorGraph version uses SoftMaxCrossEntropy to compute the loss. It is computed from dense2 (the input to softmax) rather than the probabilities (the output from softmax). In contrast, the Keras version uses CategoricalCrossEntropy to compute the loss from the probabilities. These two approaches are mathematically equivalent, but the first one is more numerically stable.

Here is an alternate version that uses the more accurate approach.

keras_model = tf.keras.Model(inputs=[feature], outputs=[output, dense2])
model = dc.models.KerasModel(keras_model, dc.models.losses.SoftmaxCrossEntropy(), output_types=['prediction', 'loss'])

This tf.keras.Model outputs two different values: output and dense2. When we create the dc.models.KerasModel, we specify the output_types argument to tell it how to interpret each of the outputs. The first one has type 'prediction', so it will be returned by predict(). The second one has type 'loss', so it will be passed to the loss function. This allows the loss to be computed based on different values than the ones returned by predict().

The Road to TensorFlow 2

You probably know that TensorFlow 2 has just been released, and that it makes huge changes in how you create and use models. A major focus of DeepChem 2.3 is to lay the groundwork for that transition.

To be clear, DeepChem 2.3 is still written for TensorFlow 1 (more specifically, 1.14). Given the size of the changes in TensorFlow 2, we expect many people will continue to use 1.x for a long time to come. DeepChem 2.3 is intended to be a stable base for those people to work on.

At the same time, it contains many internal changes in preparation for transitioning to TensorFlow 2 in the next major release. Our goal is that models and code created with DeepChem 2.3 should continue to work in that release. Mostly you shouldn’t need to worry about the details of those changes. But if you have been wondering about our plans for TensorFlow 2 support, be assured that we are already at work on it!