Writing DeepChemic Code

One challenge for new DeepChem developers can be figuring out how to write “DeepChemic” code; that is, code which matches DeepChem’s styles and conventions. I’m writing up this issue to attempt to document some of DeepChem’s underlying conventions:

  • Follow the Abstract Superclasses: DeepChem uses abstract superclasses to organize its codebase. These are classes like Featurizer, Transformer, Dataset, Model, etc. These abstract superclasses encapsulate common APIs that are followed across all featurizers/transformers/etc. When adding new functionality to DeepChem, first see if your functionality matches one of our existing abstract classes. If so, make a concrete subclass of the abstract superclass. For example, suppose you have a new fluid mechanics featurizer. You’d then want to make a FluidFeaturizer subclass of the abstract Featurizer. Note that sometimes we have a hierarchy of abstract superclasses. MolecularFeaturizer inherits from Featurizer and concrete classes like CircularFingerprint inherit from MolecularFeaturizer.
  • Use DeepChem Datastructures: DeepChem has a couple of core data structure classes (DiskDataset, NumpyDataset, ImageDataset) that tie the library together. Most DeepChem APIs operate on these core datastructures. Learn these datastructures carefullly and use them to good effect in your code
  • Avoid Adding New Conventions: Each new conventions you add to DeepChem creates additional cognitive burden for users. Ask if your new feature is really so unique that it can’t be massaged into an existing DeepChem abstraction. If necessary work to expand the DeepChem abstract superclasses to become more general rather than making a special on-off subclass. As a concrete example, suppose you are working with another library (such as HuggingFace) which follows conventions that don’t match DeepChem’s. In this case, we would work to expand DeepChem abstractions to encompass HF’s abstractions rather than one-off using HF abstractions/conventions to the codebase.
  • Unity of Design: The broad goal of DeepChem is to enable AI powered science (see Making DeepChem a Better Framework for AI-Driven Science). Ask how your feature fits into this broader goal? Ideally functionality you add should be broad and multipurpose, suitable for addressing multiple classes of problems.
  • Avoid Breaking APIs: To the degree possible avoid breaking an existing API. If you absolutely need to change an API, you must deprecate the changed feature and maintain backwards compatibility for at least a major release cycle.
  • Follow Good Engineering Practices: DeepChem code is intended to be production stable and usable by a broad range of users. For us to meet our goal, we follow Python engineering best practices including type annnotations, extensive unit tests, flake8, detailed docstrings and more. Imagine that a maintainer 5 years from now who has never met you has to understand your code. How can you help them?

If you have more thoughts on what constitutes DeepChemic code, please chime on this thread! I’ve written up some of my first thoughts but there are likely a lot of DeepChemic concepts I’m still missing given the size and breadth of the codebase.