Google Summer of Code 2024 Ideas

Please join the discord https://discord.gg/FKh47UEctV and we would be happy to discuss more with you there. You can also join the office hours Announcing the DeepChem Office Hours

Hello,
My name is Priya Yadav, and I am currently a sophomore pursuing a degree in computer science and engineering. I am familiar with Python and have experience working with Jupyter Lab. While I don’t have professional experience, I have engaged in coding, particularly in areas such as data analysis, machine learning, and Android development.So, I would like to contribute to the beginner-friendly project- Improving Antibody Support .
I’d appreciate any advice on getting started and approaching the project.Excited about the possibility of contributing.
Thanks

Dear Bharath,

I am Shashank Shekhar Singh, a sophomore from IIT BHU, India, with a keen interest in AI and machine learning. Intrigued by the Protein Language Modeling project, I’m eager to contribute my expertise in Hugging Face, TensorFlow, PyTorch, and LLMs. Before diving in, I’d appreciate your guidance on getting started.

I’m also interested in the PyTorch Porting and ModularTorchModel Implementation Work projects. Could you advise on the best ways to procced further? Excited to collaborate with you and @deep_chem!

Hi, I am Param Parekh, a Second Year B.Tech student at VJTI, Mumbai, India (GMT +5:30). The project describing “Implementation of Wishlist Model” fascinates me with its extensive utilization deep neural networks models as well as its profound educational value. I have more than 1 year experience with basic machine learning algorithms. I have developed and optimized deep learning models while implementing projects on object detection and segmentation using Convolutional neural networks and time series forecasting using Recurrent neural networks. I work on Ubuntu 22.04 and I am familiar with VS code, Jupyter Notebook and PyCharm IDEs. Kindly throw light on desired project outcomes and how should I begin contributing

Hi, I’m Arya, a third year B.Tech CSE Student from VIT Vellore, and a Research and Development Intern at IIT Kanpur, in the AI-ML Field. I’m extremely intrigued by Improving Antibody Support, Protein language model, and Improve Equivariance support. I am extremely familiar with Hugging Face, TensorFlow, Pytorch, LLMs and almost all existing python libraries in Machine Learning using Jupyter Notebooks, IDEs and VSC, with almost 3+ years of experience with Projects, Research Papers in foreign conferences and Internships, with hands on experience in live projects.

Would love to contribute to GSOC @DeepChem

For potential applications, please introduce yourselves at The Introductions Thread! and not on this thread. Let’s keep this focused on GSoC ideas and discussions

Hello, my name is Amit Subhash Chejara and I am learning machine learning and PyTorch. I have completed my BSc last year. I am interested in two ideas, protein language modeling and torch compile and PyTorch. Since I am currently learning pytorch and have some experience in linear regression models and classification models in pytorch, I want to ask weather I can contribute to this project or I need a more deeper knowledge of pytorch?
Please Guide me!

Hello DeepChem Team,
I am Suraj Mahapatra, a third-year B.Tech student at SRM Institute of Science and Technology. Currently engaged in a research and development program focusing on Large Language Models (LLMs) at NIT Rourkela, I am eager to leverage my skills in the field of Deep Learning and develop successful models out of it. I am writing to express my strong interest in contributing to the development of the Protein Language Model and Improving Equivariance Support.

Previously I have been working on the Hugging Face Model, Vision Transformers, Tensor-flow, PyTorch and I am excited to share my expertise by contributing in your project. Before taking the plunge, I would value your suggestion and mentoring in the project.

Hello! I am Divyanshu Rana, A 1st year student at Graphic era University, where I am pursuing a degree in Master of Computer Applications (MCA). I am really excited about Improving Antibody Support and Layer Tutorials. I have familiarity with deep learning frameworks such as PyTorch that would be essential for implementing antibody-specific models.

Thank you for your attention, consideration, and ongoing support. Together, let us continue to strive for excellence and make a positive difference in the lives of individuals

With warm regards,
Divyanshu Rana

Hello Bharath,

I am Sasidharan, a final-year B.E. student studying Computer Science. I am intrigued by Layers Tutorials, Improving Antibody Support, and Improving New Drug Modality Support, and I’m enthusiastic about contributing my expertise to @DeepChem. Before delving deeper, I would greatly appreciate your guidance on how to get started.

Thank you!

To repeat folks, please don’t introduce yourselves on this channel. It’s not the right place. Please use the introductions channel: The Introductions Thread!

Hello @bharath,
I have been studying the deepchem codebase recently. The community have done quiet an advancement in software aspect in this domain. But I think there is a vacancy in polymers domain. I could not find much of code or contributions on this field. Even in molecule net I could not find much regarding polymers. If there are codes for studying monomers or polymer behaviours or any dataset please help me find it else if it’s suitable for you we can come up with a proposal for a project for GSoC’24. I have few ideas we could discuss.

Hey there! :wave: I’m Aparna, a third year student at IIT BHU and I’m super excited about the DeepChem Layer Tutorials Enhancement project! I’ve been working on Jupyter Notebook and Colab projects, diving into different machine learning concepts. The project “Layer Tutorials” aligns perfectly with my interests, as I’m eager to improve technical communication skills and contribute to the community. Could you guide me on the best way to get started? I’m ready to dive in and make some meaningful contributions! :rocket:

A proposal about polymers could be very welcome. You should try to center it around applications to drug discovery. Come by office hours to discuss with us!

Come by office hours and we would be glad to give guidance!

I am thrilled to express my interest in the project “Protein Language Modeling” for GSOC 2024. My name is Awnish Singh, a fourth-year undergraduate student at BITS Pilani, where I have been deeply involved in research projects under the guidance of Dr. S. Murugesan, focusing on target drug prediction. My experience spans various domains, from computer vision to software development, and I have actively contributed to GSOC in the past.

I am particularly excited about the opportunity to extend DeepChem’s support for using language models with chemistry applications to include protein language modeling. Given the growing importance of protein machine learning in both academia and startups, I believe this project offers a unique opportunity to make a meaningful contribution to the field.

Currently, I am gaining valuable experience through an internship where I am working on power automation with MS Azure and the EasyOCR library for text extraction. My previous involvement in a Genetic Algorithm-based project focused on implementing deep learning models for identifying features of protein coding genes has provided me with a strong foundation in this area.

I am eager to collaborate with the DeepChem community, contribute to the development of a protein language model. I am committed to attending office time meetings and engaging in discussions to ensure the success of this project.

@awnish10-scs Please post in the introductions thread The Introductions Thread!. This channel is only for general GSoC questions about topics

Protein Language Models

Other models that are not present in hugging face but can be of interest. https://www.nature.com/articles/s41587-022-01618-2#code-availability
Other examples for Protein structure generation. https://huggingface.co/spaces/simonduerr/ProteinMPNN

Antibody support.
A couple of definitions to start the discussion are :

  • an antibody? - an immunoprotein responsible for specifically recognizing and binding to potentially pathogenic molecules.
  • an antigen? - the molecule that the antibody targets.

Some problems that can be studied in antibody design are structural: For example, the accurate modeling of Antibody-Antigen pairs. Specially in the interaction spots. https://www.sciencedirect.com/science/article/pii/S0959440X22000586. For this kind of task is important to have structural databases such as:

New Emerging Drug Modalities.
For this type of functionalities, datasets are crucial. For PROTACS and macrocycle, some featurizer already work. Therefore, some databases of interest can be:
PROTACT-DB


http://cadd.zju.edu.cn/protacdb/help
Macrocycles.
Found this article with an analysis of existing literature.
https://pubs.acs.org/doi/epdf/10.1021/acs.jmedchem.3c00134

2 Likes

Hi, I’m Pranjal Verma, a 3rd year student from IITDelhi. I have gone through the description of Protein Language Modelling and found it really fascinating. I am have done projects in LLMs and hugging face. I would be very excited in contributing to this project. I would highly appreciate your guidance. I am ready to get started. Please let me know if I can communicate with you all regarding this project.

A quick reminder this thread is only for discussions about project ideas and not introductions. Please introduce yourselves on The Introductions Thread!