Hi,
I am looking to do a regression on the qm9 dataset, but am stuck as to how use the wrappers for tensorflow or keras.
The QM9 dataset has 20 tasks, but I would like to do a regression for one task at a time. From the documentation, we have:
."
“mol_id” - Molecule ID (gdb9 index) mapping to the .sdf file
“A” - Rotational constant (unit: GHz)
“B” - Rotational constant (unit: GHz)
“C” - Rotational constant (unit: GHz)
“mu” - Dipole moment (unit: D)
“alpha” - Isotropic polarizability (unit: Bohr^3)
“homo” - Highest occupied molecular orbital energy (unit: Hartree)
“lumo” - Lowest unoccupied molecular orbital energy (unit: Hartree)
“gap” - Gap between HOMO and LUMO (unit: Hartree)
“r2” - Electronic spatial extent (unit: Bohr^2)
“zpve” - Zero point vibrational energy (unit: Hartree)
“u0” - Internal energy at 0K (unit: Hartree)
“u298” - Internal energy at 298.15K (unit: Hartree)
“h298” - Enthalpy at 298.15K (unit: Hartree)
“g298” - Free energy at 298.15K (unit: Hartree)
“cv” - Heat capavity at 298.15K (unit: cal/(mol*K))
“u0_atom” - Atomization energy at 0K (unit: kcal/mol)
“u298_atom” - Atomization energy at 298.15K (unit: kcal/mol)
“h298_atom” - Atomization enthalpy at 298.15K (unit: kcal/mol)
“g298_atom” - Atomization free energy at 298.15K (unit: kcal/mol)
"
train_dataset.y.shape gives (105984, 12)
train_dataset.w.shape gives (105984, 12)
train_dataset.X.shape gives (105984, 50)
I get that 105984 is the number of molecules, but what does 12 and 50 mean?
Is there an example of a regression model for one task? Is it possible to do this? For example, only look at mu, and not use a MultiTaskRegressionModel.
Thanks in advance.