Hi all, I’m Aron, I recently started working at the Netherlands eScience Center, on a project that involves learning properties of molecules, hence my interest in DeepChem.
Specifically I’m looking into fine tuning ChemBERTa, and I was wondering what the latest developments are since the paper and the tutorial. More concretely I noticed that DeepChem has a profile on HuggingFace where several ChemBERTa models have been uploaded 3 weeks ago, one of which is called ChemBERTa-77M-MLM, which sounds to me like it is trained on the whole 77M curated pubchem dataset. Is that correct, and if so what tokenizer does it use? Can I use it as a drop in replacement in the tutorial linked above? Does it actually out-perform the older version?
Thanks!