I have a question about how to properly process graphs when the nodes have multiple categorical features (e.g. atom-type and hybridization). Normally, I would think that you would learn a separate embedding for each one-hot feature (e.g. using nn.Embedding) and then concatenate them. However, from what I can see in DeepChem and other libraries it seems the raw one-hot embedded features are concatenated and then passed through linear or graph convolution layers. I was just curious if anyone had comments on the differences between the two approaches? Concatenating two one-hot vectors and passing it through a linear layer is essentially embedding both one-hot features and then summing them up which seems a bit odd, but perhaps is fine?