Hello DeepChem Community!
My name is Paulina and this summer I was given the opportunity to contribute to DeepChem via GSoC 2022. I am currently a research associate at the Gladstone Institutes working on a package for analysis of scATAC-seq data called ArchR. Over the past couple of months I have been studying applications of deep learning for genomics and drug discovery. DeepChem codebase and mentors have been a great introduction to this space and I am excited to continue learning this summer!
The project I am proposing would expand DeepChem’s tools for working with genomic datasets for drug discovery thus strengthening DeepChem’s new Bioinformatics initiatives. I will be implementing a state-of-the-art predictive model for regulatory genomics and adding the relevant datasets for testing. As part of my project, I would compose a tutorial overview on interpreting regulatory sequence data using deep learning. I will figure out what loaders and featurizers to use to translate genomics data into numerical representations that machine learning models can understand. I will also implement gkm-SVM so that it is easier to develop other models down the road that have this dependency. A big part of my project will be identifying how to leverage DeepChem’s infrastructure towards biomedical questions informed by genomics as well as identifying future areas for development.