DeepChem GSoC 2025 Project Ideas

Hello Dear DeepChem Team,

I am writing about my interest in contributing to DeepChem’s “Pytorch Porting” idea during Google Summer of Code. First, I would like to tell you a little about myself, and then I would like to talk about how I can contribute to this issue. Thank you in advance, I would be thrilled to contribute to the work of the highly talented scientists and engineers there, with or without a GSOC award. I also want to add that I am a computer science masters student and I am also working. Actually, considering the 12-week limit, I was aiming for 175-hour projects, but if you do not limit this process with GSOC, I would like to convey that I can support the project beyond 12 weeks until it is completed.

I graduated from the physics department in 2023, I was a physics department, but my interest was always in computer science, so during my undergraduate education, I took part in two machine learning projects as an intern researcher, one of them was a deep learning project that aimed to identify families of proteins by looking at their molecular structure. We used the PDB database and TensorFlow. Another one was a binary classification algorithm for the largest telecommunications company in Turkey, which looked at the technical specifications of the customers’ internet connection (for example, the amount of signal they receive from the switchboard, the type of cable they use, etc.) and determined whether or not they had an internet problem when they called customer service. I coded the PCA implementation of this project from scratch, so the number of features, which was around 500, decreased to 80, and there was a high level of performance gain and noise reduction. Later, in my senior year, I came second in an artificial intelligence competition and joined KPMG Turkey as a data scientist. Currently, in addition to my 2 years of data science experience, I am a Computer Science graduate student at Ege University. I have developed machine learning models in a wide variety of areas, and I think my fundamental knowledge, especially in artificial neural networks, is very good. I know about Transformers.

I have a sample study on artificial intelligence agents, in this study, when you give a project description and team members, it gives you tasks, task assignments, a GANNT chart, plans related to the project, evaluation of the project, and development suggestions, and their output as a report. I also put it live on HuggingFace Spaces, you can try it from this link: HugginFace - Project Manager AI Agent

The idea I want to contribute to within the scope of GSOC is “Pytorch Porting”. I aim to develop(port), test and document the necessary artificial neural network models.

Also, can you help me with these questions?

  1. Are my knowledge and experience sufficient for this project?
  2. Are there any topics that are needed other than the topics in the description of this idea? If so, I would be happy to help with those topics as well.

I know it was long, thank you for reading up to this point. I will be sending my resume as an attachment. I would like to convey that I am very excited and eager to contribute to DeepChem. I hope I can do quality work on this subject. Take care of yourself, and have a good day.

Selcuk Senturk | LinkedIn

Sincerely,

Selcuk Senturk

Interest in GSoC 2025 – HuggingFace-style Model Loading

Dear DeepChem Team,

My name is Hemant Gaikwad , and I am a Software Engineer with over 4 years of experience in Software Engineering. I am excited to apply for GSoC 2025 under DeepChem, particularly for the HuggingFace-style easy pretrained-model Load project.

I have hands-on experience with fine-tuning LLMs, optimizing model inference, and implementing ML frameworks. Recently, I worked on improving a BM25 model’s accuracy to 95% and developed AI/ML-powered applications using TensorFlow and Scikit-learn. I also have experience with metadata handling and efficient model serialization.

I have explored DeepChem’s current model saving/loading mechanism and would love to contribute by designing an intuitive from_pretrained() -like function. Could you provide guidance on how best to get started? Are there any specific aspects of metadata handling that I should focus on?

Looking forward to your insights!

Best regards,
Hemant Gaikwad

Interest in GSoC 2025 - Improving New Drug Modality Support
Hi DeepChem Team,
I’m a Parimi Pujitha interested in contributing to the “Improving New Drug Modality Support” project for GSoC 2025. I would appreciate any guidance on how to get started and connect with mentors (Jose, David, Bharath).
Looking forward to your response!
Thanks!

Respected DeepChem Team,

I am a pre-final year Computer Science Engineering student at MAIT, India, and I am excited about the possibility of contributing to the Implement a Wishlist Model project at DeepChem.

I have gained extensive experience with PyTorch and building models from scratch. For instance, I implemented, trained and evaluated a Fully Convolutional Network (FCN) from scratch, achieving scores close to those published in the original paper. I have also worked on implementing and training a DCGAN (Deep Convolutional Generative Adversarial Network); although, due to compute constraints, I was unable to fully train it.

More recently, I have been working on a paper titled “Optimizing Vision Transformers Strategy: A Novel Layer Merging Strategy for Resource-Constrained Environments”, where I extensively used PyTorch, NumPy, and Vision Transformers from Hugging Face. The paper is scheduled for publication at ICDAM in June. Additionally, I am developing a transformer + MoE based chessbot from scratch in PyTorch.

I have a passion for learning about new models and understanding how they work. With my enthusiasm for research and hands-on experience, I believe I can make a valuable contribution to this project, and I am eager to learn from the DeepChem team along the way.

You can review my work on my GitHub, linked below.

I look forward to hearing from you.

Regards,
Shivam Gupta
Email: shivamguptaxia2@gmail.com
GitHub: https://github.com/shivamgcodes

Hello Dear DeepChem Team,

I am writing about my interest in contributing to DeepChem’s “ Improve Equivariance Support” idea during Google Summer of Code. First, I would like to tell you a little about myself, and then I would like to talk about how I can contribute to this issue. Thank you in advance, I would be thrilled to contribute to the work of the highly talented scientists and engineers there, with or without a GSOC award. I also want to add that I am a computer science masters student and I am also working. Actually, considering the 12-week limit, I am aiming for 175-hour projects, but if you do not limit this process with GSOC, I would like to convey that I can support the project beyond 12 weeks until it is completed.

I graduated from the physics department in 2023, I was a physics department, but my interest was always in computer science, so during my undergraduate education, I took part in two machine learning projects as an intern researcher, one of them was a deep learning project that aimed to identify families of proteins by looking at their molecular structure. We used the PDB database and TensorFlow. Another one was a binary classification algorithm for the largest telecommunications company in Turkey, which looked at the technical specifications of the customers’ internet connection (for example, the amount of signal they receive from the switchboard, the type of cable they use, etc.) and determined whether or not they had an internet problem when they called customer service. I coded the PCA implementation of this project from scratch, so the number of features, which was around 500, decreased to 80, and there was a high level of performance gain and noise reduction. Later, in my senior year, I came second in an artificial intelligence competition and joined KPMG Turkey as a data scientist. Currently, in addition to my 2 years of data science experience, I am a Computer Science graduate student at Ege University. I have developed machine learning models in a wide variety of areas, and I think my fundamental knowledge, especially in artificial neural networks, is very good. I know about Transformers.

The idea I want to contribute to within the scope of GSOC is “ Improve Equivariance Support”. I aim to develop, test and document the necessary machine learning/artificial neural network models. I think I can use scientific findings that may be useful by conducting a literature review if necessary.

Also, can you help me with these questions?

  1. Are my knowledge and experience sufficient for this project?
  2. Are there any topics that are needed other than the topics in the description of this idea? If so, I would be happy to help with those topics as well.

I would like to convey that I am very excited and eager to contribute to DeepChem. I hope I can do quality work on this subject. Take care of yourself, and have a good day.

Selcuk Senturk | LinkedIn

Sincerely,

Selcuk Senturk

1) Project Title:
Improve Equivariance Support in DeepChem

2) Abstract / Project Summary:
This project aims to extend DeepChem’s capabilities by integrating robust equivariant model support and adding new equivariant architectures, such as tensor field networks. By enhancing the library’s support for equivariance—a concept central to modern machine learning—we hope to improve model performance and provide researchers with powerful new tools. The work will involve both developing and benchmarking new modules as well as thorough documentation to facilitate their use within the DeepChem ecosystem.

3) Contributor Name:
Aryan Singh

4) Contributor Email and GitHub ID:
Email: aryansingh4653@gmail.com
GitHub: imAryanSingh

5) Potential Mentor(s):
Riya, Nimisha, Bharath, Shreyas

6) Personal Background (Brief CV):
I am a recent Computer Science graduate with hands-on experience in artificial intelligence, machine learning, and data science. I have completed research internships at ISRO’s Space Applications Centre and IIT RPR, where I developed innovative projects ranging from a Voice Activation System to several AI-based solutions. My technical expertise spans C++, Python, HTML, JavaScript, and more. I have a proven track record in competitive programming and software development, and I am passionate about applying my skills to open source projects like DeepChem.

7) Project Goals / Major Contributions:

  • Extend DeepChem’s support for equivariant models by integrating new modules.
  • Develop and implement tensor field networks for improved model generalization.
  • Benchmark the performance improvements of the new equivariant models.
  • Collaborate on detailed documentation and tutorials to help the community adopt these features.
  • Contribute to testing and debugging to ensure the robustness of the new implementations.

8) Project Schedule:

8.1) Community Bonding Period (Weeks 1-3):

  • Familiarize myself with the DeepChem codebase and current equivariance support.
  • Participate in community discussions and mentor meetings.
  • Identify key areas for improvement and gather initial feedback.

8.2) Development Phase (Weeks 4-10):

  • Design and implement new equivariant modules and tensor field networks.
  • Create unit tests and benchmark experiments for the new modules.
  • Regular commits, pull requests, and weekly progress reports.

8.3) Project Completion, Testing, and Documentation (Weeks 11-12):

  • Finalize implementations based on feedback and test results.
  • Prepare comprehensive documentation and tutorials.
  • Conduct final testing and integration, and prepare the project for submission.

9) Planned GSoC Work Hours:
As this is a medium-sized project, I plan to work approximately 18 hours per week (half-time). I will primarily work in IST (UTC+5:30) but remain flexible to collaborate across time zones. Daily commits and regular meetings (via Zoom/Skype) are planned to ensure steady progress.

10) Planned Absence/Vacation Days and Other Commitments:
I have no conflicting lectures or examinations during the GSoC period. I will ensure that any planned absences are communicated well in advance to maintain project momentum.

11) Skill Set:

  • Programming Languages: Proficient in C++, Python, HTML, and JavaScript.
  • Machine Learning & AI: Experience in developing ML algorithms, deep learning models, and AI-based systems.
  • Research & Problem Solving: Demonstrated ability to work on complex projects at ISRO and IIT RPR, with strong analytical and debugging skills.
  • Software Development: Proven track record in competitive programming and open source contributions.
  • Tools & Technologies: Git, VS Code, Jupyter Notebook, and various ML frameworks.

I am enthusiastic about the opportunity to contribute to DeepChem through this project. My experience in AI and machine learning, combined with my passion for tackling challenging problems, makes me confident that I can successfully implement and enhance equivariant model support in DeepChem. I look forward to the possibility of working with the mentor team and contributing to the advancement of open-source research tools.

Thank you for your consideration.

Hello DeepChem Team,

I am Tapash Darji, CSE student at IIT Dharwad. I am interested in the Numpy 2.0 Upgrade Project.My areas of expertise are Python, C++, machine learning and much more.
I believe this project will be a great opportunity for me to further improve my debugging skills
Looking forward to hearing from you and hopefully contributing to this project

Subject: Interest in GSoC 2025 – DeepChem Projects

Dear DeepChem Team,

I hope you are doing well. My name is Kaneez Fatma and I am currently pursuing a Master’s degree in Computer Applications , with a background in Biotechnology . I am eager to apply for Google Summer of Code (GSoC) 2025 under DeepChem, specifically for the following projects:

  1. SMILES ↔ IUPAC Conversion**
  2. Improving New Drug Modality Support**
  3. Enhancing Support for Drug Formulations**

I have a knowledge in frontend development. With experience in Python, machine learning, and Git , along with an AI and Data Analytics internship through AICTE , I have a strong interest in computational chemistry and AI-driven molecular representation . I am excited about the opportunity to contribute to DeepChem’s development while deepening my understanding of molecular conversion algorithms and drug formulation technologies .

Could you kindly provide guidance on how to get started and make meaningful contributions? Any recommended resources or initial steps would be greatly appreciated!

Looking forward to your response.

Best regards,
Kaneez Fatma
Email : shaikhkaneezfatima2@gmail.com

Hello @bharath
I wish to contribute to the “Layer Tutorials” project for Google Summer of Code 2025. As a final-year engineering student from India with a deep passion for open-source AI research, I’m excited by the opportunity to enhance the accessibility and educational value of DeepChem’s layer ecosystem.

I have hands-on experience with Python and a wide range of AI/ML libraries including TensorFlow, PyTorch, and scikit-learn. I have built over 30 AIML-based projects, all of which are available on my GitHub. Additionally, I regularly write technical articles on Medium, where I break down complex AI concepts for a broader audience. This background aligns well with the goals of this project, which emphasizes both technical clarity and educational outreach.

I would be thrilled to discuss any ideas or requirements further and look forward to the possibility of working with the amazing DeepChem community. I would love to connect and talk about the project as I submit my project.
Looking forward to hear from you
Regards,
Samvardhan Singh
Website: https://samvardhan.vercel.app/
github: https://github.com/samvardhan03

Hello @bharath
I came across the NumPy 2.0 Upgrade project for GSoC 2025 and found it very interesting. I would love to contribute to this effort as part of my GSoC journey.

I have experience with **Python, data structures, and algorithms and am familiar with NumPy and debugging compatibility issues in Python projects. I am eager to deepen my understanding of complex version upgrades and contribute to making DeepChem fully compatible with NumPy 2.0.

Before drafting my proposal, I would appreciate some guidance on:

The main compatibility issues are expected during the upgrade.
Any ongoing discussions or references regarding this transition.
The best way to get started with understand DeepChem’s dependencies on NumPy.

I am excited about this opportunity and would love to discuss how I can best contribute—looking forward to your response!

Best regards,
Prabhat Kumar
Undergraduate Student, MNNIT Allahabad

Dear DeepChem Team & Mentors,

I’m Cyril Polisetty, a passionate machine learning and data science enthusiast. I’m excited to contribute to GSoC 2025 with DeepChem. This would be my FIRST OPEN-SOURCE CONTRIBUTION, and I’m particularly interested in working on SMILES ↔ IUPAC conversion to enhance its accuracy, efficiency, and integration within DeepChem.

I HAVE HANDS-ON EXPERIENCE WITH –

ML & Data Science Experience — I have worked on projects such as breast cancer detection, sentiment analysis, and wine quality prediction, utilizing Python and Scikit-learn.
Cheminformatics Interest – Familiar with RDKit, molecular representations, and data processing.
Project Plan – Improve conversion accuracy, ensure efficient handling of complex molecules, and provide well-documented, thoroughly tested solutions.

I’d love to discuss my approach and receive your feedback. Looking forward to contributing!

Best,
Cyril Polisetty

GSoC 2025 Proposal: pretrained-model Load

Name: Anderson Calero
Email: acalerob@unal.edu.co
Affiliation: Universidad Nacional de Colombia (Chemistry Student)
Time Commitment:Full time

Current implementations require models to fit on a single GPU, limiting scalability for larger architectures.

I am excited about this project because of my interest in AI applications in chemistry and my experience with Hugging Face Agents and LLMs. I look forward to contributing by taking into account the Agents, Transformers and NPL understandings on Hugging face learn. I am entirely convince that the researchers of the future needs to making it easier to scale up deep learning models in computational chemistry by enabling it to be accessable across multiple GPUs(Open source).

I am passionate about AI for chemistry and have been actively learning about Hugging Face’s LLM and Agent courses. And my background in computational chemistry gives me insight into real-world applications for DeepChem’s models. For that reason I am eager to contribute to DeepChem’s open-source community and believe that this project aligns perfectly with my interests and skills.

LinkedIn

Hello DeepChem Team!

My name is Sunag and I’m currently studying Msc. Binformatics at Saarland University, Germany. Throughout my course, I got interested in computer aided drug design and have completed multiple ML/DL in drug discovery projects since then. I understand the need of implementing neural network models in a library such as DeepChem.

I’m really interested in implementing a Wishlist model (particularly hamiltonian networks) in DeepChem. I have a solid mathematical understanding of how LLMs work from my experience in implementing parameter efficient fine tuning methods from scratch in large chemical language models, thereby significantly increasing their performance. Therefore, I believe I have the necessary skills required for this project.

As a reference, here is my github profile for you to checkout my previous work.

I would love to get a conversation going and discuss the exact use cases for these models. I look forward to hearing from you!

Hello DeepChem Team!

My name is Benson and I’m a fourth year Data Science Specialist student at University of Toronto, and I am excited about the possibility of contributing to the Improve Equivariance Support project at DeepChem. I have experience with mainly Python, TensorFlow, and building machine learning models.

My experience spans developing machine learning models and optimizing performance in various domains. As a research assistant at the CHAI Lab (University of Toronto, in collaboration with Google), I improved micro-gesture classification accuracy on smartwatches by 15% through domain adaptation and advanced modeling techniques, including GANs, Transformers, and Contrastive Learning. At the Far Data Lab, I second-authored a paper on Federated Learning’s effects on state-of-the-art models and designed an Audio Spectrogram Transformer in federated settings using the Flower framework. My industry experience includes building budget prediction microservices at IDK Wealth using FastAPI, Docker, and LangChain, while also improving transaction autocategorization accuracy to 85%. Additionally, I contributed to algorithmic fairness research within clinical applications, co-authored a tutorial paper, and developed an R package to facilitate fairness evaluation.

The Improve Equivariance Support project specifically caught my attention as it was something mentioned in my course, but haven’t had the chance to explore it in depth. Thus, I’m eager to learn how these models are implemented and contribute to their implementation in DeepChem.

Please let me know if this is the right place to connect.
I’d love some guidance on how I can get started.

Thanks and Regards,

Benson Chou

Hi DeepChem Team,
My name is Alba Tortosa and I’m a Biology student at the University of Córdoba, Spain. I’m reaching out to express my enthusiasm forma applying to GSoC 2025 under DeepChem. Althought I have limited professional experience, I am highly motivated and eager to learn, and I belive this opportunity would be invaluable for my growth.
The projects that caught my attention are:

  • Layer Tutorial: I find this project particularly exciting as it would allow me to deepen my understanding of data integration while contributing to improving documentation and tutorials.
    I’m eager to star contributing to Deepchem and would greatly appreciate any guidance on how to get involver, especially regarding beginner-friendly issues.
    Thank you in advance for suppot and guidance.
    Best regard,
    Alba Tortosa

Hi there!

I’m Galymzhan M., born and raised in Kazakhstan, currently PhD student in Grzybowski Lab in UNIST and CARS-IBS (Ulsan, South Korea).

I’ve been working on NN-based prediction of suitable reaction conditions for organic reactions, 3D featurization of transition metal catalyst structures, retrosynthesis prediction tools, as well as a little bit of ligand-based drug discovery. Got 2 years (and counting) experience in conducting lab team project, mentoring undergraduate students, and python package development with git version control.

Got familiarized with Potential Projects for the upcoming GSoC 2025.
Open to work on either of intermediate projects: Improving Equivariance Support for DeepChem models or Conversion of Smiles to IUPAC and IUPAC to smiles; or " Improving support for drug formulations. My priority would be the first two.

Glad to join DeepChem community!

Here’s your revised introduction with your background and expertise:


Dear DeepChem Team & Mentors,

I’m Arshi Khan, a Data Science Intern at an e-commerce unicorn in India with a strong background in AI/ML, deep learning, and software development. I’m excited to contribute to GSoC 2025 with DeepChem. This would be my first open-source contribution to DeepChem, and I’m particularly interested in working on Model-Parallel DeepChem Model Training to enable large-scale training across multiple GPUs.

My Expertise & Experience

  • Machine Learning & Deep Learning – Experience working on projects involving NLP, CNNs, and large-scale ML models.
  • Software Development – Proficient in Python, PyTorch, TensorFlow, and distributed computing techniques.
  • Parallel & Distributed Training – Familiar with multi-GPU training, model sharding, and performance optimization.

Project Plan

  • Implement basic model-parallel training support for DeepChem.
  • Adapt existing PyTorch frameworks to enable training for models that don’t fit on a single GPU.
  • Benchmark and test implementations to ensure performance improvements.

I’d love to discuss my approach and receive your feedback. Looking forward to contributing!

Best,
Arshi Khan


Let me know if you’d like any refinements! :rocket:

Title: Interest in GSoC 2025 - Improving New Drug Modality Support
Message:
Hi DeepChem Team,
I’m a student interested in contributing to the Improving support for drug formulations project for GSoC 2025. I would appreciate any guidance on how to get started and connect with mentors (Jose, David, Bharath).
Looking forward to your response!
Thanks!

Hello DeepChem team,

I am very excited to contribute more to open source projects, and I would love to work on the conversion of smiles to IUPAC and IUPAC to smiles.

I am a second year computer science student at the University of Toronto, where I am studying computer science and minoring in statistics and math.

I have done coursework at the University of Toronto and have created numerous projects relating to scientific computing, software testing, data structures and algorithms, and software development. I would really enjoy applying my skills to help build your project.

Feel free to contact me if you would like to discuss the project further. I look forward to your response.

You can learn more about me here. Thank you for your time!


https://www.linkedin.com/in/fiona-v/

Hi, It’s Abhinav Thakare here, a data science engineer with a huge passion for chemistry and biology due to my background in data science and my family’s medical influence. That was the first thing that attracted me to this project—to combine my skills with my passion.

In this project, we will be building a tool within DeepChem to enable accurate, bidirectional conversion between SMILES (Simplified Molecular Input Line Entry System) strings and IUPAC (International Union of Pure and Applied Chemistry) names. This project mainly aims to design a flexible, efficient, and comprehensive solution to facilitate molecular representation transformations for researchers.

The approach and approximate timeline for this would be
creating a domain-specific dataset for training and validation, understanding existing SMILES/IUPAC conversion methods.
Implementing baseline conversion models using existing DeepChem functionalities, followed by performance optimization.
Applying NLP techniques such as sequence-to-sequence models or transformer-based architectures to enhance accuracy.
Integrating the final model into DeepChem, ensuring robust API endpoints, extensive documentation, and rigorous testing.

Some of the project titles I have worked on are Classification with Neural Networks (Individual), Product Demand Prediction, Language Detection (Group), and Gold Price Prediction.

Hoping to hear from you soon. If you would, you can provide me with any task to check my skill set. and I look forward to your feedback to improve my proposal further.