Convert Custom Dataset (Graph) to SMILES


I have a custom dataset which is a list of graphs that I’d like to convert into SMILES representation. I was checking out RawFeaturizer but I understand how to use it., I also checked some posts in the forum but I mostly see posts w.r.t converting smiles to graph using rdkit or convmolfeaturizer. Please, can some help me with the same. Thank you.

Is it related to converting a molecule represented as an adjacency matrix into SMILES? Can you share an example of the graph structure which you have?

1 Like


I have the node features and the edge index for the molecule. There are more than 100 nodes in the graph which makes the plot difficult to understand.

edge_index = [2, 85520]
nodes features = [2815, 50]

So in general, it’s not straightforward to go back from a graph to a SMILES. One reason is that there are many graph representations of molecules possible with different formats. You’d need to make sure you have the information needed to reconstruct the full molecular formula which isn’t always possible.

My two cents would be that it’s easiest to write something custom for your application that manually reconstructs the formula of your compounds. We don’t currently have a good way to do this in DeepChem

1 Like

Thank you very much, I’ll follow that route.

If you can extract the list of atoms and bonds from the graph, you could do it with RDKit. Build up a molecule object, then call MolToSmiles().


Thank you, I was looking for something very similar.

You can check my approach to recreating molecules from graph.
But like previously said, it all depends on how graph was created.