Google Summer of Code 2021 Ideas

DeepChem is part of Open Chemistry, which will be applying for Google summer of code this summer. We don’t yet know if we will be selected, but I wanted to start a thread up to brainstorm ideas for potential student projects:

  • JaxModel: DeepChem currently doesn’t have a good way to build models with Jax. This project would work to add a wrapper JaxModel in the style of KerasModel and TorchModel that allows for convenient wrapping of arbitrary Jax models in DeepChem. The project will involve implementing JaxModel, writing a suitable test suite, and putting together a good tutorial on how to use Jax with DeepChem as a jupyter notebook
  • Pytorch Lightning support: PyTorch lightning is a popular framework for PyTorch. This project would look into enabling the easy construction of PyTorch lightning based models for DeepChem
  • Paper Implementation: This project is more open ended. Pick a research paper that you like and implement it within DeepChem. For success in this project, you should reach out to me or other DeepChem community members for feedback and help in picking a suitable project.

If you have other suggestions for GSoC projects, please post them here!

5 Likes

Perhaps we could use GSoC as the opportunity to dedicate attention to the protein engineering side? Thinking of implementing papers like UniRep, or recent promising proteins + transformer work. Just a shot in the dark idea! :slight_smile:

4 Likes

To add on one idea, I’d love to see better support for semiconductor design in DeepChem. This new paper uses graph neural networks to work towards designing new semiconductors:

I’ve been writing a series of articles about the semiconductor industry which might be a good source of background information if you’re interested in this topic: https://deepforest.substack.com/

I’ve discussed this with a few people, but one idea would be to create a Model Hub like the one Hugging Face provides: https://huggingface.co/models
A scientific deep learning model hub would be very powerful I think. It could be “seeded” with a few of DeepChem’s most popular models and then users could upload new models and use them for transfer learning. This could be integrated with MoleculeNet later on as well, so models could be automatically evaluated across MoleculeNet tasks. It also leverages DeepChem’s positioning across multiple sciences, since I don’t really know of any other projects that are well-suited for this kind of hub.

2 Likes

The official wiki is up at https://wiki.openchemistry.org/GSoC_Ideas_2021#DeepChem_Project_Ideas. I’ve ported over some of these ideas to the wiki!

1 Like

This is a really cool idea but might be a little too complex to pull off within GSoC. The new GSoC program is shortened (about a month and change for students) and this project is probably too hard. I’m really interested in making this happen though!

1 Like

I’m happy to announce that DeepChem, through Open Chemistry, will be supported in Google Summer of Code! This means we can indeed invite student applications to GSoC :slight_smile:

We also need more mentors! The more mentors we have, the more students we can support. If you’re a scientist who uses DeepChem and who is willing to help mentor a summer student, please get in touch with me. You can email me at bharath@deepforestsci (add the .com).

4 Likes

Adding on one more idea

  • DeepChem Retrosynthesis: DeepChem currently doesn’t have good support for retrosynthesis tooling. This project would involve improving MoleculeNet support for common retrosynthesis datasets. This project should also pick and implement a good retrosynthesis model within DeepChem, perhaps leveraging https://github.com/ASKCOS/ASKCOS. ASKCOS is open source but MPL (which is like GPL), meaning we would have to be careful with license issues if we use it.
1 Like

hello @seyonec, I’m also interested in this topic, do you have recommended reading for transformer for protein sequence? thanks

2 Likes

Awesome! Here is a more general intro to machine learning and computational biochemistry: https://www.notion.so/Computational-Biochemistry-e57c4194c4234a898ecf2db36bb74015

I’ll share some more specific papers as well:
UniRep - https://www.nature.com/articles/s41592-019-0598-1

TAPE (protein transfer learning benchmark) - https://www.biorxiv.org/content/10.1101/676825v1

@bharath Tagging Bharath if he has any other recommended papers!

2 Likes

Hi @seyonec ! I agree with you. Do you think that a combination with the use of EBI API (http request ) could help in models with protein sequences and maybe could be and idea for GSoC ?

1 Like

this is an awesome repository of papers! thanks for sharing!

1 Like

DeepChem and GSoC??!?
What could be a better way to spend part of a summer :innocent: :cowboy_hat_face: :pray:

In the last dev-call the idea of expanding DeepChem’s hyperparameter tuning through Ray came up as a possible GSoC project, and I would love to participate in this as a mentor.

I have been working with Ray-Tune extensively for my day job, and it feels like a soberly spec-ed implementation may well be within the scope of a GSoC project…
… and I would also say that hyper-parameters are really nice educationally due to their wide-applicability and the degree to which novel maths pop-up.

One modest goal that occurs to me would be perhaps to reach parity with the current HOpt (grid + gaussian) and then expand with with something a little bit more modern like HyperBand, and would be so happy to communicate with anyone else interested in these topics :smiley:

2 Likes

Hi @stanleydrift,
The idea of adding functionalities for hyper-parameter tuning sounds really interesting and I would like to learn more about this project and discuss it. Could you share some resources/plan and also could your share your email (or any other preferred communication channel)?

2 Likes

Hi @bharath
I am highly interested in implementing support for pytorch lightning. and I would like to learn more about this project and discuss it. Could you share some resources/plan to get started?

Can you join one of the Developer calls? Joining the DeepChem Developer Calls

At a high level, the goal of the project is to make a LightningModel class that can be used like TorchModel to enable the construction of DeepChem models in PyTorch Lightning along with tests and documentation and tutorial. Glad to chat more more on the call :slight_smile:

Hi @bharath, I have been actively practicing deep learning in PyTorch (personal bias: pytorch >>> tf :upside_down_face:) and liked the ideas of lightning implementation and protein language modelling, and also the semiconductor modeling support is something I can’t get my head around. I would like to discuss on these, and get some planning and timeline sorted :smile: .

What would the purpose of that class be? TorchModel and pytorch_lightning.LightningModule both perform roughly the same role. They wrap a torch.nn.Module and provide an API for training, logging, validation, etc. If you already have one of them, what would you gain by wrapping it in the other?

Hi @bharath, I tried to build the pytorch lightining class for torch models and i would like to show you my work and have doubts to discuss.

Great question! I think primarily PyTorch Lightning appears to be emerging as a new standard for the PyTorch community for model building so seems useful to have a wrapper. But you’re absolutely right there’s overlap with TorchModel. Perhaps we should discuss proper design on this week’s developer call?