ChemBERTa latest?

Hi all, I’m Aron, I recently started working at the Netherlands eScience Center, on a project that involves learning properties of molecules, hence my interest in DeepChem.

Specifically I’m looking into fine tuning ChemBERTa, and I was wondering what the latest developments are since the paper and the tutorial. More concretely I noticed that DeepChem has a profile on HuggingFace where several ChemBERTa models have been uploaded 3 weeks ago, one of which is called ChemBERTa-77M-MLM, which sounds to me like it is trained on the whole 77M curated pubchem dataset. Is that correct, and if so what tokenizer does it use? Can I use it as a drop in replacement in the tutorial linked above? Does it actually out-perform the older version?


1 Like

The latest paper is from the ELLIS ML for Molecules workshop (, The new weights are from this workshop paper. We will have a new arxiv paper up soon!

1 Like

Any update on this?
And can I ask, where do I find this list of 200 properties that the MTR model was trained on?

I believe the descriptors are all at @seyonec Do you know if we used any other descriptors?

Sorry for the delay. We’ve been trying to add a set of experiments to the paper before the arxiv post and it’s been going slower than optimal.