Graph Classification via Random Forest

Hello everybody!

I need your help again, I’m trying to create a classifier that can distinguish anticancer drugs as active or inactive and distinguish those active in three classes, describing the molecules as a graph.

My supervisor suggested me to use the Random Forest classifier, but I can’t figure out how to use this algoritm with this data structure, has anyone had any experience? Thanks!

Random forests don’t work on graphs. They work on fixed length vectors. Maybe your supervisor meant to try a simple featurization, like an ECFP fingerprint, with a random forest? That’s useful as a baseline. It’s a very simple model, but it often still works reasonably well. That way, as you start trying out different types of graph models, you have something to compare them to. Do the complicated models work any better than the simple one?

@peastman the idea is to use random forest to classificate the molecule based on their activity. I start from a molecular graph dataset like this:

Data(x=[9, 9], edge_index=[2, 18], edge_attr=[18, 2], y=[0])

where x, edge_index and edge_attr are a torch tensor, and y is the label (0=inactive, 1=activity of class one, 2=activity of class two…). To run a random forest classifier I think I must to convert them into a vector like a np.array, but I have no idea how to do it.

Do you have access to the molecular formulas/SMILES? The data you’ve given could be for any graph and not necessarily a molecular graph. To run ECFP, you would need the SMILES string or similar input to allow DeepChem/RDKit to compute the molecular descriptors

@bharath yes, I have both