GSoC Project: Dynamic DeepChem

nd-02110114 · March 19, 2020, 2:58pm

Hi, I’m really interested in participating in Project: Dynamic_DeepChem via GSoC.

I have some questions about this project and I seem other people are also interested like following a gitter comment.

out of curiosity, why do you want to migrate to JAX? I thought migration from TF1 > TF2 would be enough. Then I remember there were some mentions of using PyTorch, and now JAX. Isn’t it going to be a massive pain to reimplement every Keras/TF/Pytorch model out there in JAX, as well as adding new ones?
I m strongly pro PyTorch, in Pytorch vs JAX, especially seeing how popular the former is in the research community and how new JAX is

So, I post this topic. I mainly have the same opinion as the above comments.

My main question is why do you want to migrate to JAX or Pytorch?

As mentioned above, I also think TF2 may be enough. Many GCN libraries like dgl are written by Pytorch. I think deepchem will be appreciated by TF users if deepchem is written by TF.

On the other hand, I know many new GCN models are written by Pytorch. I could understand the necessity of reimplementation by Pytorch if deepchem want to attract more attention, especially from research community.

In the case of JAX, I think JAX is a good framework. JAX is fast and readable because of Define and Run and numpy like API. However, I seem that JAX is early stage. Even installation is difficult and NN framework based on JAX are under development like flax.

In summry, I just want to know your opinions

Why do you want to migrate? (TF2 is enough?)
Which is better for migration, JAX or Pytorch ?

bharath · March 19, 2020, 4:29pm

My apologies, I think there’s been a lot of confusion over the wording of this project. I think Jax has a unique angle that’s worth exploring more deeply for molecular machine learning. This doesn’t mean the the core deepchem repo will be migrated into Jax necessarily. We are putting a lot of effort into getting PyTorch capabilities. This is happening in github.com/deepchem/torchchem

Over time, my goal is to move deepchem from a monolith package into a collection of sturdy, smaller packages. We’re splitting out moleculenet into github.com/deepchem/moleculenet
There’s torchchem (for PyTorch) I mentioned above. There’s the core deepchem package github.com/deepchem/deepchem, which will likely remain TF2.X.

If Jax plays well in practice, most likely we’ll start a separate repo for Jax-chemistry models

peastman · March 19, 2020, 5:00pm

Our goal is to support as many machine learning packages as possible. The MoleculeNet package will include just about everything that isn’t TensorFlow specific. It will have the canned datasets, of course, but also the core Dataset class and everything that’s useful for processing datasets: featurizures, transformers, etc. Once you have a transformed, featurized dataset, it will let you access that in lots of different forms: as a TensorFlow dataset, as a PyTorch dataset, as Numpy arrays, as a Pandas dataframe, etc.

The other big piece of DeepChem is the models. All our current ones are written in TensorFlow, and that’s how they’ll stay. But we want to let new models be written with whatever framework people want to use, so we’ll add a separate package for models written with PyTorch, and possibly a package for ones with JAX.

nd-02110114 · March 25, 2020, 2:41am

@bharath @peastman
Thank you for your reply! I could understand the goal of DeepChem and why you try to implement models with JAX.

I write up the proposal for GSoC 2020. If you have time, please check it and give me your feedback

bharath · March 24, 2020, 6:32pm

Just responded to you over Gitter! As a quick heads up, this forum is publicly accessible so everyone (not just me and Peter) can read your proposal. If you’d prefer to keep it private, you can just share with me over email (X.Y@gmail.com, where X=bharath, Y=ramsundar)

nd-02110114 · May 6, 2020, 2:59pm

I found the blog post about the GNN with JAX. I think this is a really helpful as we get started with the GSoC.