I found a solution which only works with the nightly version but not with the stable one.
The actual problem was that rdkit was not supported on the M1 Mac because of pycairo. Apparently, they removed this dependency (https://github.com/conda-forge/rdkit-feedstock/pull/72) and now rdkit (at least higher builds) should be compatible with arm64. Before, only the custom builds were able to run on M1 Mac (https://github.com/conda-forge/rdkit-feedstock/issues/63)
First, we need to install Miniforge3 (https://github.com/conda-forge/miniforge#miniforge3) for arm64 architecture. After this, we install TensorFlow following this article (https://medium.com/codex/installing-tensorflow-on-m1-macs-958767a7a4b3). If everything goes right, rdkit could be installed into the same environment as tensorflow:
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install rdkit rdkit-dev
As the final step, we install deepchem nightly:
pip install --pre deepchem
I run a quick test using Tutorial 4 and apart from a few warnings, it run okay:
In [2]: import deepchem as dc
dc.__version__
Out [2]: '2.6.0.dev'
In [3]: tasks, datasets, transformers = dc.molnet.load_tox21(featurizer='ECFP')
train_dataset, valid_dataset, test_dataset = datasets
print(train_dataset)
Out [3]: RDKit WARNING: [13:26:39] WARNING: not removing hydrogen atom without neighbors RDKit WARNING: [13:26:48] WARNING: not removing hydrogen atom without neighbors
<DiskDataset X.shape: (6264, 1024), y.shape: (6264, 12), w.shape: (6264, 12), task_names: ['NR-AR' 'NR-AR-LBD' 'NR-AhR' ... 'SR-HSE' 'SR-MMP' 'SR-p53']>
In [4]: train_dataset.w
Out [4]: array([[1.04502242, 1.03632599, 1.12502653, ..., 1.05576503, 1.17464996,
1.05288369],
[1.04502242, 1.03632599, 1.12502653, ..., 1.05576503, 1.17464996,
1.05288369],
[1.04502242, 1.03632599, 1.12502653, ..., 1.05576503, 0. ,
1.05288369],
...,
[1.04502242, 0. , 1.12502653, ..., 1.05576503, 6.7257384 ,
1.05288369],
[1.04502242, 1.03632599, 1.12502653, ..., 1.05576503, 6.7257384 ,
1.05288369],
[1.04502242, 1.03632599, 1.12502653, ..., 0. , 1.17464996,
1.05288369]])
In [5]: model = dc.models.MultitaskClassifier(n_tasks=12, n_features=1024, layer_sizes=[1000])
In [6]: import numpy as np
model.fit(train_dataset, nb_epoch=10)
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print('training set score:', model.evaluate(train_dataset, [metric], transformers))
print('test set score:', model.evaluate(test_dataset, [metric], transformers))
Out [6]: WARNING:tensorflow:AutoGraph could not transform <function KerasModel._create_gradient_fn.<locals>.apply_gradient_for_batch at 0x13de945e0> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function KerasModel._create_gradient_fn.<locals>.apply_gradient_for_batch at 0x13de945e0> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
training set score: {'roc_auc_score': 0.947299617865711}
test set score: {'roc_auc_score': 0.6867514307999948}
I will do some further testing and give an update if I find something. I hope this might help someone.
Cheers!