Regression example for one task

dgodin19 · November 19, 2022, 8:03am

Hi,

I am looking to do a regression on the qm9 dataset, but am stuck as to how use the wrappers for tensorflow or keras.

The QM9 dataset has 20 tasks, but I would like to do a regression for one task at a time. From the documentation, we have:
."

“mol_id” - Molecule ID (gdb9 index) mapping to the .sdf file

“A” - Rotational constant (unit: GHz)

“B” - Rotational constant (unit: GHz)

“C” - Rotational constant (unit: GHz)

“mu” - Dipole moment (unit: D)

“alpha” - Isotropic polarizability (unit: Bohr^3)

“homo” - Highest occupied molecular orbital energy (unit: Hartree)

“lumo” - Lowest unoccupied molecular orbital energy (unit: Hartree)

“gap” - Gap between HOMO and LUMO (unit: Hartree)

“r2” - Electronic spatial extent (unit: Bohr^2)

“zpve” - Zero point vibrational energy (unit: Hartree)

“u0” - Internal energy at 0K (unit: Hartree)

“u298” - Internal energy at 298.15K (unit: Hartree)

“h298” - Enthalpy at 298.15K (unit: Hartree)

“g298” - Free energy at 298.15K (unit: Hartree)

“cv” - Heat capavity at 298.15K (unit: cal/(mol*K))

“u0_atom” - Atomization energy at 0K (unit: kcal/mol)

“u298_atom” - Atomization energy at 298.15K (unit: kcal/mol)

“h298_atom” - Atomization enthalpy at 298.15K (unit: kcal/mol)

“g298_atom” - Atomization free energy at 298.15K (unit: kcal/mol)

"
train_dataset.y.shape gives (105984, 12)
train_dataset.w.shape gives (105984, 12)
train_dataset.X.shape gives (105984, 50)

I get that 105984 is the number of molecules, but what does 12 and 50 mean?

Is there an example of a regression model for one task? Is it possible to do this? For example, only look at mu, and not use a MultiTaskRegressionModel.

Thanks in advance.