Comparison of Chemberta to MolBert

JanoschMenke · March 17, 2022, 10:58am

Hi I was just wondering whether someone ever compared Chemberta to MolBert?

I think MolBert has interesting training strategies, like comparing the equivalence of to Smiles.

bharath · March 18, 2022, 5:42am

I don’t believe we’ve done a head to head comparison but conceptually they are similar architectures.

@seyonec Would you have an idea of similarities/differences offhand?

JanoschMenke · March 18, 2022, 2:11pm

Yes the architectures are very similar. But the training strategy differs. MolBert uses additional prediction tasks like chemical properties and distinguishing SMILES. And in their analysis the additional task provided better “latent representation” than MLM alone.

I am also curious if someone tried to fine tune Chemberta with a Classification Task but continuing to use MLM at the same time?

bharath · March 18, 2022, 3:00pm

ChemBERTa-2 similarly uses chemical property prediction training and finds similar results that it can outperform pure SMILES training. We don’t use smiles distinguishing though which would be cool to check out. I don’t believe anyone has explored continuing to use MLM while fine-tuning either!

JanoschMenke · March 21, 2022, 8:23am

Alright thank you very much. As far as I am aware ChemBerta-2 is not yet publicly
available right?

bharath · March 21, 2022, 5:52pm

The ChemBERTa-2 paper is at https://cloud.ml.jku.at/s/dZ7CwqBkHX97C6S (should be on arxiv in next few weeks) and all model weights have already been uploaded to huggingface https://huggingface.co/DeepChem. Everything is public already but we just don’t have the arxiv paper up yet!

JanoschMenke · March 22, 2022, 9:13am

Oh okay I see now. Thank you very much. Last question regarding the architectures. The
MTR model was only trained on MTR or was MLM also part of the training. And second question for the MTR model were only the hidden states of the Start Token used? Or the states of all tokens?