Bayesian Deep Learning: Approximating Uncertainty in Deep Neural Networks

Introduction

Over the last 10-15 years, deep learning has transformed from relatively small multi-layer perceptrons to multi-billion parameter models capable of convincing human-like content generation. Deep learning models are now crucial components in an ever-growing range of applications, including medical imaging tools, drug discovery pipelines, and self-driving vehicles. With these models taking on increasingly critical roles within our technology, it’s important to ensure that they are able to robustly handle unfamiliar scenarios. One way of ensuring robustness to novel situations is to use models capable of uncertainty estimation: models which know when - and by how much - they don’t know. The field of Bayesian Deep Learning gives us a means of achieving this through a variety of techniques for approximating Bayesian uncertainty estimates. This article provides an overview of the relevant literature and a complete toolset to design, implement, train, use and evaluate Bayesian Neural Networks.

The Importance of Uncertainty Estimation

Uncertainty estimation is a crucial aspect of decision-making processes, especially in areas such as healthcare and self-driving cars. Consider a scenario where a model is tasked with classifying a subject with a potential brain cancer. We want a model to be able to estimate when it is very unsure about classifying a subject with brain cancer, and in this case we require further diagnosis by a medical expert. This highlights the need for models that not only provide predictions but also quantify the uncertainty associated with those predictions.

Traditional deep learning models, despite their power, often fall short in this regard. They operate as black boxes, making it challenging to quantify the uncertainty associated with their predictions. For example, Chat GPT can say false information with blatant confidence. Classifiers output probabilities that are often not calibrated. This lack of uncertainty awareness can lead to overconfident predictions, especially when the model encounters data that differs significantly from its training data.

Classical machine learning offers some uncertainty estimation methods. However, these methods may not be directly applicable or scalable to the complexity of deep learning models. Bayesian Deep Learning provides a framework for incorporating uncertainty estimation directly into deep learning models by leveraging Bayesian statistics.

Limitations of Traditional Deep Learning

One of the main limitations of traditional deep learning is that even though they are very powerful tools, they don’t provide a measure of their uncertainty. This limitation stems from the deterministic nature of standard neural networks. A feed-forward neural network (the simplest deep learning structure) processes your input by multiplying the input by a matrix of parameters. Then a non-linear activation function (this is the true power of neural nets) is applied entry-wise to the result of this matrix multiplication. We will now refer to the model’s set of parameters as w. If you have a certain mapping between input and output, in the extreme deterministic case, only a single set of parameters will be able to process the input and bring you the desired output.

Another thing that we need to consider is that the neural network is optimized using an algorithm known as gradient descent. The idea is that we start from a point in the parameter space and, as the name suggests, descend in the direction indicated by the negative gradient of the loss. The loss function could be potentially full of local minima, so finding the true global minimum can be a hard task. Another thing we could do is restart the training from different starting points and compare the loss function values.

Bayesian Neural Networks: Embracing Uncertainty

The idea of a Bayesian neural network is to add a probabilistic "sense" to a typical neural network. In essence, Bayesian Neural Networks (BNNs) treat the weights of the neural network as probability distributions rather than fixed values. This allows the model to represent the uncertainty in its parameters, which in turn leads to uncertainty-aware predictions.

In this case, the optimization principle is to find the best estimate of the distribution p(w|D). While p(w|D) is obviously a mystery, p(D|w) is something we can always work on. When we get p, we are not only getting a machine learning model; we are virtually getting infinite machine learning models. This means that we can extract some uncertainty boundaries and statistical information from your prediction.

A. If the model is probabilistic, for some given inputs, you extract some statistical information from the output (e.g.

Building Bayesian Deep Learning Models

Building Bayesian Deep Learning models involves approximating Bayesian inference using Bayesian Deep Learning methods. This often involves techniques like Variational Inference or Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution over the network's weights.

One can implement a Bayesian Neural Network using a combination of Turing and Flux, a suite of machine learning tools. The probabilistic model specification below creates a parameters variable, which has IID normal variables. Inference can now be performed by calling sample. Next, we use the nn_predict function to predict the value at a sample of points where the x1 and x2 coordinates range between -6 and 6. Suppose we are interested in how the predictive power of our Bayesian neural network evolved between samples.

As we can see, it is a two-layer feed-forward neural network with Bayesian layers. Now, the result that comes out of the model class is probabilistic. This means that if we run our model 10,000 times, we will get 10,000 slightly different values.

The idea is not to optimize the loss of a Neural Network but the loss of infinite Neural Networks. We did that using a loss function that incorporates the metric known as Kullback-Leibler divergence. After optimizing our loss function we are able to use a model that is probabilistic. We implemented this using torch and a library named torchbnn.

Approximate Inference Techniques

Since exact Bayesian inference is often intractable for deep neural networks, approximate inference techniques are required. Two popular approaches are:

Variational Inference (VI): VI aims to approximate the posterior distribution with a simpler, tractable distribution. This involves defining a family of distributions and then finding the member of that family that is closest to the true posterior.

Read also: Continual learning and plasticity: A deeper dive
Markov Chain Monte Carlo (MCMC): MCMC methods construct a Markov chain whose stationary distribution is the target posterior distribution. By sampling from this chain, we can obtain samples that approximate the posterior. Examples include Metropolis-Hastings, Hamiltonian MCMC, and NUTS. MCMC methods suffer from the curse of dimensionality.

For high-dimensional problems stochastic variational inference (SVI) can be used. SVI involves approximating the posterior with parametric families, such as the mean field assumption. The goal is to find the parameters of the approximate posterior that best fit the training data.

Evaluating Bayesian Deep Learning Models

Evaluating Bayesian Deep Learning models requires assessing not only the accuracy of the predictions but also the quality of the uncertainty estimates. Metrics such as calibration error and negative log-likelihood can be used to evaluate the model's ability to accurately quantify its uncertainty.

Probabilistic Programming

This article is a gentle introduction to the field, you only need a basic understanding of Deep Learning and Bayesian statistics.By the end of this article, you should have a basic understanding of the field, its applications, and how it differs from more traditional deep learning methods.If, like me, you have heard of Bayesian Deep Learning, and you guess it involves bayesian statistics, but you don't know exactly how it is used, you are in the right place.

Bayesian Statistics

Bayesian. represents the long-term frequency of events. about a model’s parameters. learning. advantages and disadvantages. working through a number of examples from scratch. as a general purpose tool in machine learning.Understand computational challenges in Bayesian inference and how to overcome them.Get acquainted to the foundations of approximate bayesian inference.Take first steps in probabilistic programming with pyro.Get to know the model-based machine learning approach.Learn how to build probabilistic models from domain knowledge.Learn how to explore and iteratively improve your model.See many prototypical probabilistic model types, e.g. range of problems. distributions, followed by the Gaussian process model. from the Mauna Loa observatory.

MCMC and SVI

Introduction to MCMC.Example: King Markov and his island kingdom.Metropolis-Hastings, Hamiltonian MCMC, and NUTS.Example: Marriages, divorces and causality.Part 4: Approximate inference with stochastic variational inference (SVI)MCMC methods suffer from the curse of dimensionality. problem stochastic variational inference (SVI) can be used. parametric families, such as the mean field assumption. given the training data. posterior approximation with a particular estimation technique. schedule. mathematical modeling.

The Roles of Physicists and Engineers

The approach of a physicist is more theoretical. The physicist looks at the world and tries to model it in the most accurate way possible. The approach of an engineer is way more practical. The engineer realizes all the limits of the physicist’s models and tries to make the experience as smooth as possible in the laboratory. The engineer might do more brutal approximations (e.g. A scientist can discover a new star but he cannot make one. In the everyday life of a researcher, it kind of works like this. A physicist is someone who has a theory about a particular phenomenon. A. If the model is deterministic, you may change the initial conditions by a certain delta (e.g.

Now let’s get into the language of machine learning. A. B. If the machine learning model is probabilistic, for some given inputs, you extract some statistical information from the output (e.g.

First question: do you need a Neural Network? If the answer is yes, then you must use it (you don’t say).

Prerequisites

The participants should be familiar with the basic concepts.We assume prior exposure to machine learning and deep learning. cover these in the course.Knowledge in python is required in order to complete the exercises. constraints.

tags: #bayesian #deep #learning #tutorial