DeepChem GSoC 2025 Project Ideas

This post lists draft potential ideas for GSoC 2025 projects. (Note that we are planning on submitting an application but we will not know if we have been accepted until later.) We have divided projects into beginner, intermediate, and advanced categories. Projects are also listed with suggested GSoC project size (small, medium, large, see https://developers.google.com/open-source/gsoc/faq)

Beginner Friendly Projects

Beginner projects should be accessible for new developers and focus on projects that don’t require engineering sophistication.

  • Layer Tutorials
    • Length: Small (90 hours)
    • Description: DeepChem has been moving towards first class layers and now has a collection of general layers. We still need to improve the documentation for existing layers to make them more useful for the community. This project should add tutorials for using existing layers to the DeepChem tutorial series, and should plan to add a few new layers that would be useful to the community.
    • Educational Value: Students will learn to improve their technical communication skills and learn how to construct useful Jupyter/Colab tutorials. Layers are easier to add than full models since they are effectively functions.
    • Potential Mentors: Aryan, Jose, Riya, Maithili, Nimisha, Shreyas
    • Note: This was also a 2024 project, but there remains more work to be done for 2025 expanding tutorials/ideas.
  • Improving New Drug Modality Support
    • Length: Small (90 hours)
    • Description: DeepChem at present doesn’t have much tooling or support for working with emerging drug modalities. These include PROTACs, Antibody-drug-conjugates, macrocycles, oligonucleotides and more. This project would add new tutorials introducing these new drug modalities and provides examples of how to work with them with deepchem. It would also be useful to identify and process relevant datasets.
    • Educational Value: New drug modalities drive many emerging startups in the space. Improving DeepChem’s support for these new modalities of therapeutics could help drive discovery of new medicine at the cutting edge. It would prepare students to potentially find jobs at these up-and-coming biotech firms as well.
    • Potential Mentors: Jose, David, Bharath
    • Note: This was also a 2024 project, but there remains more work to be done for 2025 expanding support for new modalities.
  • Improving support for drug formulations
    • Length: Small (90 hours)
    • Description: Drug formulations are a rich area of industrial study that is often critical for actually bringing a drug to patients. See https://drughunter.com/resource/the-modern-medicinal-chemist-s-guide-to-formulations/ for example for a guide. In this project, you will build a tutorial introducing readers to the study of drug formulations along with DeepChem examples of how you can computationally help design a potential formulation.
    • Educational Value: Formulations are critical for bringing drugs to patients. Improving DeepChem’s support for these new modalities of therapeutics could help drive discovery of new medicine at the cutting edge. It will prepare students to find jobs at large biotech/pharma firms as well.
    • Potential Mentors: Jose, David, Bharath

Intermediate Projects

These projects require some degree of hacking, but likely won’t raise challenging engineering difficulties.

  • Improve Equivariance Support

    • Length: Medium (175 hours)
    • Description: DeepChem has limited support for equivariant models. This project would extend support for equivariance to DeepChem and add additional equivariant model such as tensor field networks to DeepChem
    • Educational Value: Equivariance is one of the most interesting ideas in modern machine learning and underpins powerful systems like AlphaFold2. Contributors will learn more about this field and could potentially write a research paper about their work on this project.
    • Potential Mentors: Aryan, Riya, Nimisha, Bharath, Shreyas
    • Note: This was also a 2024 project, but this project was not taken up by a student last year.
  • Numpy 2.0 Upgrade

    • Length: Large (175 hours)
    • Description: DeepChem is currently on Numpy < 2.0. The upgrade to 2.0 is not backwards compatible. We need to fix any broken compatibilities.
    • Educational Value: Complex version upgrades take a lot of sophistication and will teach students challenging debugging skills.
    • Potential Mentors: Bharath
  • Conversion of Smiles to IUPAC and IUPAC to smiles

    • Length: Large(300 hours)/Medium(175 hours)
    • Description: This project focuses on developing tools within DeepChem to enable accurate, bidirectional conversion between SMILES (Simplified Molecular Input Line Entry System) strings and IUPAC (International Union of Pure and Applied Chemistry) names. The final deliverables will include user-friendly APIs, thorough documentation, and comprehensive testing to facilitate reliable molecular representation transformations.
    • Educational Value: Deepening of understanding of chemical data structures, algorithm optimization for molecular conversions, and contributing to the Deepchem ecosystem.
    • Potential Mentors: Shreyas, Bharath

Advanced Projects

These projects raise considerable technical and engineering challenges. We recommend that students who want to tackle these projects have past experience working in large codebases and tackling code reviews for complex code.

  • Implement a Wishlist Model

    • Length: Large (300 hours)
    • Description: DeepChem has an extensive wishlist of models (https://github.com/deepchem/deepchem/issues/2680). Pick a model from the wishlist and implement it in DeepChem. We suggest tackling a model such as Hamiltonian or Lagrangian Neural networks or Physics Inspired Neural Operators (PINO) that will improve DeepChem’s physics support.
    • Educational Value: Implementing a machine learning model from scratch or from an academic reference into a production grade library like DeepChem is a challenging task. Doing so requires understanding the base model, dealing with numerical issues in implementation, and benchmarking the model correctly. Multiple past GSoC contributors have leveraged their implementations to write papers on their work and have gained skills that they have used subsequently in industry or in academia.
    • Potential Mentors: Depends on model.
  • PyTorch Porting

    • Length: Medium (175 hours)
    • Description: DeepChem has mostly shifted to PyTorch as its primary backend, but a couple models are still implemented in TensorFlow, in particular our Chemception implementation. This project would port Chemception and do final testing to fix issues with PyTorch/DeepChem compatibility on implementations. https://github.com/deepchem/deepchem/issues/2863.
    • Educational Value: Porting models while preserving numerical properties requires a strong understanding of deep learning implementations. It serves as a test of machine learning know-how that will serve students well in future machine learning positions in academia or industry.
    • Potential Mentors: Aryan, Jose, Riya, Nimisha, Bharath, Shreyas
  • HuggingFace-style easy pretrained-model Load

    • Length: Large (300 hours)
    • Description: DeepChem requires you to know the parameters used to train a model in order to reload it from disk. This is unfriendly for distributing pretrained models. In this project, you will implement an easy HuggingFace-style function call to load weights from disk without having to know training parameters. To do this, you will set a standard metadata format for saving model parameters that can be used behind the scene to autoload models from disk.
    • Educational Value: This is a technically challenging project which will require understanding metadata formats and changing saving/reloading for existing models.
    • Potential Mentors: Aryan, Bharath
  • Model-Parallel DeepChem Model Training

    • Length: Large (300 hours)
    • Description: DeepChem now has good support for training LLM models through huggingface. At present though, these models cannot be too large and must fit on a single GPU. In this project, you will implement basic support for model parallel training to train models with weights that don’t fit on a single GPU.
    • Educational Value: This is a technically challenging project which will require understanding multi-GPU training methods. You may need to explore existing PyTorch frameworks for model-parallel training and adapt them to DeepChem.
    • Potential Mentors: Aryan, Bharath

Community members, please add on more suggestions!

22 Likes

Hi DeepChem team,

I’m interested in applying for GSoC 2025 under DeepChem and have explored the project ideas. Two ideas particularly interest me:

  1. Implementing a Wishlist Model – I’d love to work on integrating a new model into DeepChem. Implementing such models aligns with my deep learning background and interest in applying ML to scientific computing.
  2. PyTorch Porting – Given DeepChem’s shift towards PyTorch, I’m also interested in porting Chemception to PyTorch and ensuring full compatibility with DeepChem’s ecosystem. I have prior experience working with deep learning frameworks like PyTorch and TensorFlow, so this would be an exciting challenge.

While I have not contributed to DeepChem yet, I have been an active open-source contributor, particularly in sktime, where I implemented the BoxCoxBiasAdjustedForecaster and am currently adding a deep learning-based forecasting model. I have experience debugging complex ML issues, writing clean and maintainable code, and ensuring proper integration of my implementations.

I’d love to hear your thoughts on how I can best align my skills with the project’s needs. Additionally, do you have any recommendations on specific areas I should explore or contributions I can make before the application period?

Looking forward to your guidance!

Best,
Sanskar Modi

Hi there,

I’m interested in the “Improve Equivariance Support” project for DeepChem. I’d like to be assigned and would appreciate a brief on the issue as well as guidance on how to start contributing. Could you please let me know if this is the right place to connect and, if so, how I should get started? I’m eager to learn more about equivariant models and help extend DeepChem’s capabilities in this area.

Thanks for your time,
Ritik Thakur

2 Likes

hi DeepChem team
,this side dikshant jha ,i am a machine learning enthusiasts and have read all the project ideas .After going through all the projects ,my interest aligns to the project of :Implementing a Wishlist Model:- i would like to contribute in this project and integrate a new model in deep chem . I would be very greatful if anyone can guide me from where to start ,till the time i explore on the listed models need to be implemented

Dear DeepChem team,

I am a final year Physics student at UCL, and would like to apply to DeepChem under GSoC 2025. I have worked with ML and DL models for a while, with my current research being in deep RL for enhanced sampling in molecular dynamics simulations. I believe my expertise in both physics and ML align well with DeepChem.

Within the project ideas, I find Implementing a Wishlist Model to be quite interesting and would love to be able to work on DeepChem’s physics support. Please let me know if there is anything specific I should know before I get started.

Kind Regards,
Faraaz Akhtar

1 Like

Hi,

I am Mattia Rigon from Italy. I have a bachelor’s degree in Computer Engineering and am currently pursuing a master’s in Artificial Intelligence Systems at the University of Trento.

I have one year of experience as a software engineer, where I learned to develop scalable and professional systems. During my master’s, I have focused my studies on deep learning applied to various domains, including:

  • Computer Vision
  • Natural Language Processing (NLP)
  • Graph Learning – I have contributed to a paper on positional encodings for graphs (arXiv:2502.09365) and am currently working on generative models such as discrete diffusion models and discrete flow matching applied to graph data.

Given my experience in generative models on graphs I am really interested in the Implement a Wishlist Model project.

Additionally, I would like to ask whether this projects, even partially, could be used as the basis for my master’s thesis.

Looking forward to your response.

Best regards,
Mattia Rigon

Here can be found some of my projects: https://github.com/MattiaRigon

Good Evening DeepChem,
I am looking forward to GSoC 2025, and have explored the ideas given above. Two projects that particularly interest me are :-

  1. Implementing a wishlist model
  2. PyTorch Porting

Both of the projects align with my goals of learning about new architectures and I also hold knowledge on PyTorch. Looking forward to hear from the team about how we should initially start for now and contribute? And if any prior readings to be completed…

Hi Aryan and Bharath,

I’m really interested in your GSoC project, “Model-Parallel DeepChem Model Training.”

I have solid experience with model training, including using FSDP, DeepSpeed, and Megatron. I’ve also worked on multi-node and multi-GPU setups. This seems like a great fit for your goal of implementing model-parallel training for bigger models. I am also keen on contributing to the open-source community, including FastChat and so on.

I’m a master’s student at the University of Chinese Academy of Sciences, and you can check out my Github Page. It includes some projects on distributed training and LLM optimization that might give you a sense of my background.

To show my experience early on, is there something I could prep—like a small prototype of model-parallel training adapted to a DeepChem-like setup? Let me know what you think! Looking forward to hearing from you and hopefully contributing to this project.

Dear DeepChem Team,

I’m a sophomore pursuing a double major in physics and computer science. I’m interested in applying for GSoC 2025 under DeepChem and exploring the “Improve Equivariance Support” project.
I have made a few contributions and am familiar with the Git workflow. I also have experience in Python, particularly in ML applications.
I’d love some guidance on how I can get started.

Thanks and Regards

Pranav Suryawanshi

Subject: Interest in GSoC 2025 – SMILES ↔ IUPAC Conversion Project

Dear DeepChem Team,

I’m a sophomore pursuing a degree in Computer Science and am interested in applying for GSoC 2025 under DeepChem. The SMILES ↔ IUPAC conversion project caught my attention as it aligns with my interest in computational chemistry and AI-driven solutions.

I have experience with Python, particularly in machine learning applications, and am familiar with the Git workflow. I would love to contribute and gain deeper insights into molecular representations and conversion algorithms.

Could you please guide me on how to get started and make meaningful contributions? Any resources or pointers would be greatly appreciated!

Looking forward to your response.

Best regards,
Manasa Gayathri & M Rohit Sri Krishna

Dear DeepChem Team,

I’m a junior pursuing a computer science degree at University of Engineering and Techonology, Vietnam National. I’m interested in applying for GSoC 2025 under DeepChem for the “Improve Equivariance Support” project.

About my background, I am working for the bioinformatics laboratory of my Department of Computational Science and Engineering of my university. I have some experience in researching and two small works published about applying machine learning and heuristics algorithms in cancer biomarkers discovery. Also, I have experience in Python, particularly in ML applications and general knowledge about NLP, CV, LLM. Here is my github link: https://github.com/t1hachane

Is this the right place to connect? If so, how should I get started?

Thanks and Regards

Ha Tang Vinh

Hi DeepChem Team,

My name is David Nwosu, and I’m a first-year Electronics and Computer Engineering student at the University of Port-Harcourt, Nigeria. I’m excited to apply for GSoC 2025 under DeepChem and am particularly interested in two projects:

  1. Layer Tutorials : It’s really exciting to see this project because I just worked on a project that involved layers—I’ve built a mini deep learning framework from scratch, implementing layers and automatic differentiation—and I’d love to gain more knowledge while contributing by improving DeepChem’s layer documentation and tutorials.
  2. Improve Equivariance Support : This project also caught my attention because it introduces me to cutting-edge concepts like equivariant models, which I’ve read about but haven’t yet had the chance to explore in depth. I’m eager to learn how these models respect symmetries in data and contribute to their implementation in DeepChem.

As a college freshman, I may not have extensive professional experience, but I’ve taken the initiative to learn and apply machine learning concepts on my own, and I believe the GSoC with DeepChem would be an invaluable opportunity to expand and enhance my knowledge while contributing meaningfully to the organization.

I’d love to start contributing to DeepChem while waiting for the application period. Could you guide me on how to get started or suggest beginner-friendly issues I can work on?

Looking forward to your guidance!

Best regards,
David Nwosu

Dear DeepChem Team,

I’m Xu Jiahao, a freshman from the Global Institute of Future Technology at Shanghai Jiao Tong University. I’m new to advanced studies and open-source, but I’m highly enthusiastic and eager to learn. I’ve been self-studying machine learning, improving my Python and Git skills, and working on projects like reproducing AI models and handling bioinformatics data.

Among the GSoC 2025 projects, I’m most excited about “Improve Equivariance Support.” I find the concept of equivariance fascinating and believe it’s crucial for advancing DeepChem’s capabilities in handling complex molecular structures. I’m eager to dive deep into this area and contribute meaningfully to the project. I’m also interested in the other projects, such as “Layer Tutorials” and “SMILES to IUPAC Conversion,” which I see as great opportunities to expand my knowledge.

I’m thrilled about the possibility of joining GSoC with DeepChem. It would be a game-changing experience for me.

Is this the right place to connect? How can I get started?

Thanks and regards,
Xu Jiahao

Hello,DeepChem Team

I am a senior student with extensive experience in deep learning and model optimization. I am actively involved in research and practical applications in the field of machine learning. I have accumulated rich experience in deep learning

My areas of specialization include

  • Biological protein analysis - I have had competitive results in Kaggle competitions on this area, which is very close to deepchem’s requirements
  • Deep learning implementations - I have extensive hands-on experience in model architecture design, training, and optimization, especially for large language models.
  • Model engineering - I have contributed to the VLLM open source project, and my work is significant enough to be compiled into a research paper (currently unpublished).
  • PyTorch Development - I have extensive experience with the PyTorch framework, including model implementation and optimization.

Given my background in deep learning frameworks and model engineering, I am particularly interested in the PyTorch porting and HuggingFace style easy pre-trained model loading projects. My experience with PyTorch implementations and model architecture optimization makes me well suited to tackle these challenges.

Additionally, I am interested in exploring how these projects might align with research goals and potentially form the basis of future academic publications. Looking forward to contributing to this project

Hello Team,
I am Thembo Jonathan, a software engineering student at Makerere University, Uganda.
I have a privilege to contribute to Ersilia(that equips laboratories in Low and Middle Income Countries with state of the art AI/ML tools for infectious and neglected disease research.)
I hope to make a positive contribution here too during this summer.

Thank you

Hi there,

I’m interested in the “numpy 2.0” project for DeepChem. I’d like to be assigned and would appreciate a brief on the issue as well as guidance on how to start contributing. Could you please let me know if this is the right place to connect and, if so, how I should get started?

Thanks for your time,
Pritush Kr Singh

Hi Team,

I’m interested in contributing to the “Layer Tutorials” project for GSoC 2025. I have experience in “Python, Django, Docker, and backend development” , and I’m eager to contribute to DeepChem by improving tutorials and adding useful deep learning layers.

I have a few questions regarding the project:

  1. What are the key areas where documentation needs improvement? Are there specific layers that require detailed tutorials?
  2. Are there any recommended resources to understand DeepChem layers better before contributing?
  3. Would you suggest any beginner-friendly issues I can explore to get started?

I’ve gone through the existing DeepChem tutorials and started exploring how layers work. Looking forward to your guidance on how I can contribute effectively.

Thanks!
Thanshir Mohammed

Hi Aryan and Bharath,
I am Amogh Joshi, currently pursuing Masters in Artificial Intelligence from Indian Institute of Technology Kharagpur. As my current research works include working with large language models, I have gained a lot of experience in fine-tuning LLMs on domain specific tasks. I am interested in the following two projects:

  1. HuggingFace-style easy pretrained-model Load
  2. Model-Parallel DeepChem Model Training

I would love to hear from you guys.
Please let me know, if I need to do anything before I get started.

Thanks and Regards,
Amogh Joshi

Subject: Interest in GSoC 2025 Project – Conversion of SMILES to IUPAC and Vice Versa

Dear DeepChem Team,

I am Pradyumna Prahas, a B.Tech student in Artificial Intelligence and Machine Learning (AIML) at KMIT, Hyderabad . I have a strong foundation in Python, Java, C++, Machine Learning, Deep Learning, Neural Networks, and Transformers , and I have worked on two projects in this domain . I am particularly interested in the “Conversion of SMILES to IUPAC and Vice Versa” project, as I find molecular representation learning fascinating and would love to contribute to improving DeepChem’s capabilities in this area.

I wanted to check if this is the right platform to communicate regarding GSoC 2025. If so, I would love your guidance on how to get started. Could you help me understand the best way to begin contributing? Are there any beginner-friendly issues or resources I should explore first? Additionally, it would be great if you could provide the best way to reach out for further discussions—whether via email, forum, or any other preferred contact channel.

Looking forward to your response!

Best regards,
Pradyumna Prahas