I’m wondering how feasible the following idea would be: to use ChemBERTa as a fragment growing molecule generator. My idea would be the following:
- fine tuning inputs would be small molecule fragments
- from fragment screens of multiple targets that are good or even low-affinity hits that have been optimized and expanded by chemists
- from bioactive molecules broken down into fragments as a data augmentation strategy
- outputs would be the full inhibitor molecules that are “grown” or linked from input fragments - multiple fragments pointing to the same or multiple outputs.
- Use specific input fragments to generate output molecules
Does this sound reasonable? I have struggled to find much in the literature that tackles the fragment growing problem in this way.
A second pertinent question is how much data would be needed to effectively fine-tune a model? Is there a minimum number of samples or does it really just depend on a bunch of different factors so there’s no single answer?
Thanks for your time!