GSoC Final Report : Drug Target Selection and Validation

GSoC project: Drug Target Selection and Validation

Project goals

The primary objective of this project was to develop infrastructure for target selection and validation. The original plan was to integrate ConPLex for target selection and ESM for target validation within DeepChem. However, ConPLex and ESM couldn’t be used independently for drug target selection and validation. The project evolved significantly from the initial proposal. While adhering to the project’s aim of creating a tutorial for drug target selection and validation, I incorporated multiple tutorials to test hypotheses for drug target selection using ML and LLM. These hypotheses lay the groundwork for integrating a target selection and validation pipeline into DeepChem. The goals were achieved through several tutorials based on a literature review in the field.

Contributions:

  1. Tutorial for predicting protein structure using ESM-2: I developed a tutorial focused on protein structure prediction, introducing the concept of protein folding and how to predict it using ESM-2. The tutorial also includes a method to visualize the resulting protein structures.

  2. Tutorial to introduce druggability and druggability prediction: This tutorial introduces the science of druggability and provides a comprehensive dataset for druggability prediction. It utilizes Fpocket and machine learning to classify proteins into druggable and non-druggable candidates.

  3. Tutorial to predict drug target and disease association for target selection: In this tutorial, I guide users through downloading and processing multiple datasets to construct a protein-disease association network. The datasets include Protein-Protein Interaction (PPI), UniProt sequences, MeSH, OMIM, and more. These are used to build a heterogeneous network to identify protein targets positively correlated with a specific disease. SVDD and Random Forest algorithms are applied for target prediction.

  4. Tutorial for Drug Target Interaction: This tutorial focuses on predicting drug-target interactions based on protein embeddings generated by ProtBert and small molecule drug targets. It serves as a method for testing the interactions between disease-associated targets identified in the previous tutorial and known drug molecules.

  5. Tutorial to introduce Drug Target Selection and Validation: This tutorial introduces the concepts of drug target selection and validation, integrating the methods discussed in the previous sections, such as druggability assessment and disease association analysis. By combining these approaches, we aim to demonstrate how insilico methods can be effectively used for identifying and validating potential drug targets.

Lessons learned and future plans

This is my second term as a GSoC contributor, and the experience has been both more complex and rewarding. The project demanded extensive reading of scientific papers, a deep understanding of concepts, and the formation of well-founded hypotheses. Documenting my insights in the tutorial was itself a challenge, but it helped me grow in structuring my thoughts, communicating complex ideas clearly, and effectively developing hypotheses. Although the official GSoC contribution period has concluded, I plan to continue working with DeepChem. My goal is to integrate some of the methods I explored into DeepChem’s infrastructure and develop additional tutorials that offer a more comprehensive analysis of drug target selection. Ultimately, I aim to compile my work into a paper, showcasing the tutorials I’ve developed.

Conclusion:

Being part of GSoC with DeepChem has been an incredible opportunity. This experience has greatly enhanced my understanding of scientific development. I’m extremely grateful to Bharath Ramsundar for his unwavering support and mentorship throughout the program. As I continue my research journey, I’m excited to contribute further to DeepChem. This journey has been highly rewarding, and I hope to expand my efforts to help realize DeepChem’s vision in the future.