Uncertainty estimation in regression

Hello,

I am trying to figure out the variance (uncertainty) estimation in regression tasks. I have looked into https://github.com/deepchem/deepchem/blob/master/examples/tutorials/25_Uncertainty_In_Deep_Learning.ipynb notebook and referred to the graph_models.py script. I want to know if the three lines below in graph_models.py from line number 830 compute both aleatoric and epistemic uncertainty?
self.uncertainty_dense = Dense(n_tasks)
self.uncertainty_trim = TrimGraphOutput()
self.uncertainty_activation = Activation(tf.exp)
compute the variance? How is the line
self.uncertainty_dense = Dense(n_tasks)
differ from line 828 which I believe is the predicted output
self.regression_dense = Dense(n_tasks)

Would really appreciate a response.
Thank you!

1 Like

Those lines are related to aleatoric uncertainty. The model is learning to predict uncertainty based on how well its predictions match the training data. Epistemic uncertainty is evaluated in a different way: by repeating each prediction with many different dropout masks and seeing how well they agree.

In terms of the structure of the model, the two outputs (regression and uncertainty) are very similar. The difference comes in how they’re trained. The loss function treats them in very different ways, so the regression output learns to reproduce the labels and the uncertainty output learns to estimate the error in the regression output.

Thanks so much for explaining that!
I just have one follow up question- would you please point me to the python files I should be looking to see the difference in their training that you mentioned in the response? I was trying to connect the dots between the uncertainty paper which talks about the precision matrix being defined and a prior being put over the weights (similar to what is done for the predicted output) and the implementation in the deepchem library.

Thanks again for the response!

Take a look at where the loss is defined:

If you aren’t using uncertainty, it just specifies loss = L2Loss(), a standard loss function to make the outputs match the labels. But if you’re using uncertainty, it uses a more complicated function that scales the actual difference (labels[0] - outputs[0]) by the predicted uncertainty (outputs[1]). This matches equation 5 in https://arxiv.org/pdf/1703.04977.pdf with outputs[1] = log(sigma**2), that is, log_var.

1 Like

Thank you! Really appreciate the help!