DeepChem GSoC 2026 Potential Ideas

We have a lot of newcomers coming onto here. Welcome to the community! I am scoping out potential projects for GSoC 2026 (remember we have to apply to get in, so no guarantee DeepChem will be selected yet). Here are some tentative project directions (I will update this forums post as we get new ideas):

  • Symbolic machine learning (think like https://arxiv.org/abs/2305.01582 except Python)
  • MLIP support (like https://github.com/instadeepai/mlip, but we want to do in pytorch)
  • LLM support for 7B models in DeepChem (i.e, make a Olmo model in DeepChem https://huggingface.co/allenai/OLMo-7B). Should be able to train/run inference with models
  • Implement RFDiffusion, RFDiffusion-2 or other protein design models in DeepChem
  • Improve DFT support in DeepChem: DeepChem has preliminary density functional theory support (https://arxiv.org/abs/2309.15985). Build on this! Can you solve new systems, make this scale better, implement other xc-functions?
  • Improve materials machine learning in DeepChem: DeepChem has simple crystal graph convolutions and lattice adsorption model support from a few years ago. Test these models on real systems and improve them. Possibly implement new papers from the last few years.

If you are looking to apply this year, please start scoping out these directions. The more work you do up front, the more likeley we will pick you!

I will restart office hours in limited format by the start of next year once fully back from paternity leave (at least 1 day a week)

5 Likes