This is a really cool idea but might be a little too complex to pull off within GSoC. The new GSoC program is shortened (about a month and change for students) and this project is probably too hard. I’m really interested in making this happen though!
Google Summer of Code 2021 Ideas
I’m happy to announce that DeepChem, through Open Chemistry, will be supported in Google Summer of Code! This means we can indeed invite student applications to GSoC
We also need more mentors! The more mentors we have, the more students we can support. If you’re a scientist who uses DeepChem and who is willing to help mentor a summer student, please get in touch with me. You can email me at bharath@deepforestsci (add the .com).
Adding on one more idea
- DeepChem Retrosynthesis: DeepChem currently doesn’t have good support for retrosynthesis tooling. This project would involve improving MoleculeNet support for common retrosynthesis datasets. This project should also pick and implement a good retrosynthesis model within DeepChem, perhaps leveraging https://github.com/ASKCOS/ASKCOS. ASKCOS is open source but MPL (which is like GPL), meaning we would have to be careful with license issues if we use it.
hello @seyonec, I’m also interested in this topic, do you have recommended reading for transformer for protein sequence? thanks
Awesome! Here is a more general intro to machine learning and computational biochemistry: https://www.notion.so/Computational-Biochemistry-e57c4194c4234a898ecf2db36bb74015
I’ll share some more specific papers as well:
UniRep - https://www.nature.com/articles/s41592-019-0598-1
TAPE (protein transfer learning benchmark) - https://www.biorxiv.org/content/10.1101/676825v1
@bharath Tagging Bharath if he has any other recommended papers!
Hi @seyonec ! I agree with you. Do you think that a combination with the use of EBI API (http request ) could help in models with protein sequences and maybe could be and idea for GSoC ?
this is an awesome repository of papers! thanks for sharing!
DeepChem and GSoC??!?
What could be a better way to spend part of a summer
In the last dev-call the idea of expanding DeepChem’s hyperparameter tuning through Ray came up as a possible GSoC project, and I would love to participate in this as a mentor.
I have been working with Ray-Tune extensively for my day job, and it feels like a soberly spec-ed implementation may well be within the scope of a GSoC project…
… and I would also say that hyper-parameters are really nice educationally due to their wide-applicability and the degree to which novel maths pop-up.
One modest goal that occurs to me would be perhaps to reach parity with the current HOpt (grid + gaussian) and then expand with with something a little bit more modern like HyperBand, and would be so happy to communicate with anyone else interested in these topics
Hi @stanleydrift,
The idea of adding functionalities for hyper-parameter tuning sounds really interesting and I would like to learn more about this project and discuss it. Could you share some resources/plan and also could your share your email (or any other preferred communication channel)?
Hi @bharath
I am highly interested in implementing support for pytorch lightning. and I would like to learn more about this project and discuss it. Could you share some resources/plan to get started?
Can you join one of the Developer calls? Joining the DeepChem Developer Calls
At a high level, the goal of the project is to make a LightningModel
class that can be used like TorchModel
to enable the construction of DeepChem models in PyTorch Lightning along with tests and documentation and tutorial. Glad to chat more more on the call
Hi @bharath, I have been actively practicing deep learning in PyTorch (personal bias: pytorch >>> tf ) and liked the ideas of lightning implementation and protein language modelling, and also the semiconductor modeling support is something I can’t get my head around. I would like to discuss on these, and get some planning and timeline sorted .
What would the purpose of that class be? TorchModel
and pytorch_lightning.LightningModule
both perform roughly the same role. They wrap a torch.nn.Module
and provide an API for training, logging, validation, etc. If you already have one of them, what would you gain by wrapping it in the other?
Hi @bharath, I tried to build the pytorch lightining class for torch models and i would like to show you my work and have doubts to discuss.
Great question! I think primarily PyTorch Lightning appears to be emerging as a new standard for the PyTorch community for model building so seems useful to have a wrapper. But you’re absolutely right there’s overlap with TorchModel
. Perhaps we should discuss proper design on this week’s developer call?
Join our developer call for the week :). We can discuss design ideas there with @peastman too hopefully
Sure, that sounds fine. @JainSamyak8840 if you plan to join one of the developer calls, let me know which one. Both times are convenient for me, so I vary which one I call in for.
We’ve started getting a lot of interest from students on these projects, and in a few cases, have multiple students who’re interested in working on the same project. We want to make sure that students aren’t replicating work, so @peastman and I have suggested guidelines:
- If you are seriously interested in a project, put together a preliminary 1-2 page proposal for me and the other mentors to review. Don’t worry about whether this proposal conflicts with what another student might be working on to start.
- Once you’ve submitted the preliminary proposal, we’ll work with you to ensure your proposal doesn’t conflict with another serious proposal.
The idea is that we’re using the writing of the preliminary proposal as a marker of serious interest. If you are seriously interested in applying, we’ll work with you to make sure there are no conflicts, but otherwise we want to keep the field open for other serious students
Maybe with Ray-SDG?