In Making DeepChem a Better Framework for AI-Driven Science, I suggested the addition of a ModelHub to DeepChem. In this post, I sketch the design of a ModelHub. Here are the rough sequence of steps:
- Overhaul
dc.models.Model
to provide generic weights interface - Introduce a standard weight naming convention
- Create uniform weight saving format
- Implement for
KerasModel
,TorchModel
,SklearnModel
- Implement for
- Implement generic
Model.from_pretrained
implementation - Introduce
DEEPCHEM_MODEL_HUB
environment variable
I’ll discuss each of these steps below
Overhaul Model Superclass
The current dc.models.Model
class should really be named Estimator
since it only provides a sensible API for regressors/classifiers. Rename the current model class Estimator
and introduce a new abstract model class that specifies the following methods:
class Model():
def __init__(...):
# The same as the __init__ method we have now
def get_weights() -> Dict:
# Returns a dictionary mapping weight names to weights
def set_weight(name, value):
# Set specified weight to specified value
def get_parameters() -> Dict:
# Returns a dict of hyperparameters
def to_pretrained():
# More details on this below
@staticmethod
def from_pretrained(pretrained_weights) -> Model:
# A constructor method that creates a Model from specified pretrained weights
This API provides a standard way of accessing, updating and saving the weights for a model. Optionally, we could implement dictionary like semantics so you could do things like
gc = GraphConvModel()
# Get a weight from the model
gc["graph-conv-1"]
Note that each model must be able to set its weights. This will have to be implemented separately for different models like KerasModel
, TorchModel
, and SklearnModel
. We should also make Metalearner
and Policy
implement this API since these classes are models in the generic sense as well.
Implement a standard weight-naming convention
I propose a rough weight-naming convention as
"model_name-layer_name-i"
Where model_name
is the name of the model, layer_name
is the name of the layer and i
is a number that denotes which weight this is in the layer. The advantage of a standard weight-naming convention is that inspecting models will become easier.
Create Standard Weight Saving Format
At present, different models are stored differently on disk. We may want to consider adopting a standard format for saving weights. For simplicity, we could do something like the following directory structure:
model_weights/
-> params.json
-> weight-name1.npy
-> ...
-> weight-name-n.npy
Here params.json
holds the parameters of the model, that is, its class and the constructor arguments
{
"class": "GraphConvModel" # As an example...
"n_tasks": "10",
...
"graph_conv_layers": "[100, 100, 100]"
}
The actual weights are stored in .npy
files on disk. The filename for each .npy
file is the the weight name for each weight. We would want a method for each model
def to_pretrained():
# Generate the pretrained folder on disk
The advantage of having our own format is that we won’t break stored weights if TensorFlow/PyTorch change their checkpointing formats. We used to have stored models in TF1, but they all broke with the TF2 upgrade. Ideally by having a simple format we control, we can reduce the risk of breakage.
Implement from_pretrained
Here is a simple implementation for from_pretrained
:
@staticmethod
def from_pretrained(pretrained):
# This is reading the .npy files and returns the model class name, the constructor arguments, and a dict mapping weight names to values
class, params, weights = load_pretrained(pretrained)
# Initialize the model
model = class(**params)
for weight_name, weight_value in weights:
model[weight_name] = weight_value
Environment Variable
We should introduce a new environment variable
DEEPCHEM_MODEL_HUB
By default this should point at the DeepChem S3 bucket for now. In future, different companies may want to provide their own DeepChem model hubs. The model hub is simply a directory with the following structure
modelhub/
-> pretrained_model1/
-> pretrained_model2/
...
To load from the model hub, we simply download pretained_model1/
(in the file format we specified above) and call from_pretrained
to load this model.
Potential Complications
We will need to implement a method for setting weights for different model types which will take considerable work.
Feedback?
This is a first sketch for the design, so I’d love folks feedback on whether this makes sense. A lot of this design is still rough but I wanted to get something written down for early review. @peastman @ncfrey I’d love your feedback in particular here! Feel free to suggest changes to this design as needed to make things sensible