Comparison of Chemberta to MolBert

Hi I was just wondering whether someone ever compared Chemberta to MolBert?

I think MolBert has interesting training strategies, like comparing the equivalence of to Smiles.

I don’t believe we’ve done a head to head comparison but conceptually they are similar architectures.

@seyonec Would you have an idea of similarities/differences offhand?

Yes the architectures are very similar. But the training strategy differs. MolBert uses additional prediction tasks like chemical properties and distinguishing SMILES. And in their analysis the additional task provided better “latent representation” than MLM alone.

I am also curious if someone tried to fine tune Chemberta with a Classification Task but continuing to use MLM at the same time?

ChemBERTa-2 similarly uses chemical property prediction training and finds similar results that it can outperform pure SMILES training. We don’t use smiles distinguishing though which would be cool to check out. I don’t believe anyone has explored continuing to use MLM while fine-tuning either!

Alright thank you very much. As far as I am aware ChemBerta-2 is not yet publicly
available right?

The ChemBERTa-2 paper is at https://cloud.ml.jku.at/s/dZ7CwqBkHX97C6S (should be on arxiv in next few weeks) and all model weights have already been uploaded to huggingface https://huggingface.co/DeepChem. Everything is public already but we just don’t have the arxiv paper up yet!

Oh okay I see now. Thank you very much. Last question regarding the architectures. The
MTR model was only trained on MTR or was MLM also part of the training. And second question for the MTR model were only the hidden states of the Start Token used? Or the states of all tokens?