Drug Target Selection and Validation Pipeline

Greetings Fellow Developers!

I’m Anamika and 'm thrilled to be part of GSOC 2024, contributing to DeepChem. I’ll be working on introducing drug target selection and validation pipeline in deepchem. I’ll integrate ConPLex for target selection and ESM for target validation into DeepChem, addressing the need for target selection and validation pipeline in DeepChem. By enhancing DeepChem with these capabilities, my project aims to streamline the drug discovery process, offering high accuracy in drug-target interaction predictions and insightful structural and functional analyses of potential targets.

This forum thread serves as a hub to monitor the progress of this project.

Thank you for your interest! Drop by occasionally for updates!

1 Like

Week 1 Update:

  • Studied basic science behind the drug target selection and validation
  • Looked into how Open Target works
  • Updated the tutorial to reflect the basic science of target selection and validation
  • Build a dataset of top proteins and drug molecules to test the pipeline

Tutorial on target selection and validation :https://colab.research.google.com/drive/14xdoLXRMI7Kg_g8lpAmKkwurdB432jFT#scrollTo=axXgO2x81OP8 . My ongoing attempt to train the model: https://colab.research.google.com/drive/1AP_KXtSmczN5NeJtb0tO2Gno27A5Zlt5#scrollTo=UYyF7NT8gq2Z

Progress report Week 2:

I dived in deeper to understand the key concept of the drug target selection. I woked on implementing a Drug Target Interaction model I had porposed in the GSoC proposal. It can be found in the last section of this colab notebook: https://colab.research.google.com/drive/14xdoLXRMI7Kg_g8lpAmKkwurdB432jFT#scrollTo=tJLc3zrrZ4qg . The Drug Target Interaction tutorial is incomplete and I’ll revisit it later in the gsoc period to comlete it and merge in DeepChem.

Progress Report Week 3 and 4:

  • I tested my DTI model from previous week on the protein dataset that I build in 1st week.
  • Then moved on preparing a tutorial on basic science of Target Selection and Validation.
  • Got it reviewed by Dhuvi and Elisa.
  • Based on the reviews decided to split the tutorials in 5 parts:
  1. Protein structure prediction
  2. Druggability assesment
  3. Functional Domain Prediction of target protein
  4. Drug target interaction
  5. Target selection
  • Started Working on 1st part of the tutorial which is protein structure prediction with ESMfold.

Progress report week 5:

  • Raised the PR for Protein structure Prediction with ESMFold.
  • Started worked on drugability tutorial.

Progess report week 6:

  • I was unavailable for work
  • Got PR reviewed

Progress report week 7:

  • I got my protein Structure prediction tutorial merged in.

  • I started working on Druggability tutorial which has 2 parts : the 1st part is basic explanation of druggability and its relevance in target selection. The 2nd part is application of Fpocket and ML model to identify the pockets and classify them into druggable and undruggable.

  • I was stuck with finding a right dataset for training my model.

Progress report week 8:

  • I finalised the Druggability tutorial which is ready to raise a PR with minor modifications.
  • I started working on Functional Domain analysis of the drug target protein.

Progress report week 9:

  • I got my Druggability PR reviewed by my mentor.
  • I completed the scientific understanding of functional domain of protein and it’s relevance to drug target selection and validation.
  • I started the working on the use of PML and ML to predict the functional domain of the target protein.

Progress report week 10:

  • I addressed the requested changes on my druggability tutorial PR and got it reviewed again.
  • I worked on functional domain analysis of target protein tutorial and have the first draft ready .
  • I spend some time to improve my scientific writing.

Progress report week 11:

  • I addressed the changes requested in the druggability tutorial.
  • Added more sections to the druggability tutorial to include the different applications of ML in druggability. I added a section to introduce multiple methods of assessing druggability.
  • I revisited the functional domain analysis of target protein tutorial in attempt to include a ML model to identify suitable drug target based on predicted functional domains.

Progress report week 12:

  • I got the druggability tuorial merged
  • I integrated PPI and functional domain similarity matrix dataset from Open Target and Uniprot to train a ML model to establish disease association in target protein
  • I also started orking on Drug target interaction tutorial as proposed initially in my proposal

Progress report week 12-13:

  • Collected human protein-protein interaction, disease-protein association, and disease-disease similarity data from the HIPPIE, Uniprot, and MeSH databases, respectively.
  • Constructed a heterogeneous relationship network based on these interactions and associations.
  • Applied the DeepWalk algorithm to the heterogeneous network
  • Generated disease-protein association pair
  • Used Random Forest algorithm to build a predictive model for identifying potential disease-protein associations.
  • Worked on a subsampled dataset so far. The real dataset is too huge to run on colab.

Progress repoprt week 14:

  • I worked on the Drug Target Interaction tutorial.
  • I tested added the DTI model which uses PML based embeddings and smiles embedding by rdkit to predict the Drug Target score
  • I added the concepts required to understand the drug target interaction

Progress report week 15-16:

  • I got the protein-disease association tutorial and DTI reviewed.
  • Addressed the comments on the protein-disease association tutorial.
  • Submitted the final report to GSoC.
  • Made some progress on the final tutorial to introduce the drug target selection and validation.