Artificial Neural Networks in Machine Learning Explained

Introduction

Artificial neural networks (ANNs), inspired by the biological neural networks in the human brain, are computational models used for artificial intelligence. In the past 10 years, systems based on these networks have demonstrated impressive performance in various AI applications, such as speech recognition on smartphones and automatic translation. These networks are a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Neural networks learn useful internal representations directly from data, capturing nonlinear structure that classical models miss.

The History of Neural Networks

The concept of neural networks, also referred to as "deep learning" in modern contexts, has a rich history spanning over seven decades. Initially proposed in 1944 by Warren McCullough and Walter Pitts, neural nets became a major area of research in both neuroscience and computer science. However, their popularity waned in 1969, following the publication of "Perceptrons" by Marvin Minsky and Seymour Papert, which highlighted certain limitations. The technique then enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

Tomaso Poggio likens the cyclical nature of scientific ideas to viral epidemics, where initial enthusiasm wanes as researchers exhaust the possibilities, only for renewed interest to emerge in subsequent generations.

How Neural Networks Work

Neural networks are structured as interconnected groups of nodes, inspired by the simplification of neurons in a brain. They consist of connected units or nodes called artificial neurons, which loosely model the neurons in the brain.

Basic Structure

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

Artificial neural networks are made up of three main components:

An input layer that receives the data
An inner layer that processes the information
An output layer that transmits the result

The number of inner or hidden layers in a neural network varies depending on the complexity of a problem it needs to solve. Solving a simple addition problem would require only a few layers, while a series of complex math problems would require more than one hidden layer. Neural networks use a feedforward process in which data passes from the input layer, like the top layer of a sandwich, to the output layer, or the other side of a sandwich, to make predictions or classify data.

Nodes, Weights, and Thresholds

To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item - a different number - over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number - the sum of the weighted inputs - along all its outgoing connections.

Training the Network

When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer - the input layer - and it passes through the succeeding layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

Key Components of Neural Networks

Artificial neural networks function as building blocks in the same way neurons do for the brain and nervous system. They transmit and process information in interconnected units called artificial neurons. Every neuron processes data using a simple mathematical operation, similar to how biological neurons receive and send electrical signals.

Read also: Revolutionizing Remote Monitoring

Artificial Neurons

ANNs are composed of artificial neurons which are conceptually derived from biological neurons. Each artificial neuron has inputs and produces a single output which can be sent to multiple other neurons. The inputs can be the feature values of a sample of external data, such as images or documents, or they can be the outputs of other neurons. To find the output of the neuron we take the weighted sum of all the inputs, weighted by the weights of the connections from the inputs to the neuron. We add a bias term to this sum. This weighted sum is sometimes called the activation. This weighted sum is then passed through a (usually nonlinear) activation function to produce the output.

Activation Functions

Every neuron takes the sum of its inputs and then applies an activation layer to produce an output that gets processed to the next layer. Weighted connections represent the strength of the links between neurons. When training an algorithm to optimize network performance, you adjust those weights and reduce the differences between its predictions and the target values.

Non-linearity refers to non-linear activation functions introduced to the individual nodes of a linear network. Activation functions determine the output of a neuron based on the weighted sum of its inputs. They allow the modeling of complex relationships within data. Examples of activation functions include:

Sigmoid function, which maps inputs to a range between zero and one in traditional neural networks
Rectified linear units (ReLU), which are used in deep learning to return the input for positive values or zero for negative values
Hyperbolic tangent (tanh) functions, which map inputs to a range between negative one and one in a neural network

Layers

A layer is the highest-level building block in machine learning. The first, middle, and last layers of a neural network are called the input layer, hidden layer, and output layer respectively. The term hidden layer comes from its output not being visible, or hidden, as a network output. A simple three-layer neural net has one hidden layer while the term deep neural net implies multiple hidden layers. Each neural layer contains neurons, or nodes, and the nodes of one layer are connected to those of the next. The connections between nodes are associated with weights that are dependent on the relationship between the nodes. The weights are adjusted so as to minimize the cost function by back-propagating the errors through the layers.

A layer is a container that usually receives weighted input, transforms it with a set of mostly nonlinear functions and then passes these values as output to the next layer in the neural net. A layer is usually uniform, that is it only contains one type of activation function, pooling, convolution etc.

Read also: Boosting Algorithms Explained

Training Neural Networks

The Learning Process

The neural net learns by varying the weights or parameters of a network so as to minimize the difference between the predictions of the neural network and the desired values. The weights are adjusted so as to minimize the cost function by back-propagating the errors through the layers. The cost function is a measure of how close the output of the neural network algorithm is to the expected output. The error backpropagation to minimize the cost is done using optimization algorithms such as stochastic gradient descent, batch gradient descent, or mini-batch gradient descent algorithms. Stochastic gradient descent is a statistical approximation of the optimal change in gradient that produces the cost minima. The rate of change of the weights in the direction of the gradient is referred to as the learning rate.

Learning is the adaptation of the network to better handle a task by considering sample observations. Learning involves adjusting the weights (and optional thresholds) of the network to improve the accuracy of the result. This is done by minimizing the observed errors. Learning is complete when examining additional observations does not usefully reduce the error rate. Even after learning, the error rate typically does not reach 0. If after learning, the error rate is too high, the network typically must be redesigned. Practically this is done by defining a cost function that is evaluated periodically during learning. As long as its output continues to decline, learning continues. The cost is frequently defined as a statistic whose value can only be approximated. The outputs are actually numbers, so when the error is low, the difference between the output (almost certainly a cat) and the correct answer (cat) is small. Learning attempts to reduce the total of the differences across the observations.

Backpropagation

Backpropagation is an efficient application of the chain rule derived by Gottfried Wilhelm Leibniz in 1673 to networks of differentiable nodes. The terminology "back-propagating errors" was actually introduced in 1962 by Rosenblatt, but he did not know how to implement this, although Henry J. Kelley had a continuous precursor of backpropagation in 1960 in the context of control theory. In 1970, Seppo Linnainmaa published the modern form of backpropagation in his Master's thesis (1970). In 1986, David E. Rumelhart et al. popularized backpropagation as a general method for training neural networks.

Backpropagation is a method used to adjust the connection weights to compensate for each error found during learning. The error amount is effectively divided among the connections. Technically, backpropagation calculates the gradient (the derivative) of the cost function associated with a given state with respect to the weights.

Supervised and Unsupervised Learning

Supervised learning uses a set of paired inputs and desired outputs. The learning task is to produce the desired output for each input. In this case, the cost function is related to eliminating incorrect deductions. A commonly used cost is the mean-squared error, which tries to minimize the average squared error between the network's output and the desired output. Tasks suited for supervised learning are pattern recognition (also known as classification) and regression (also known as function approximation). Supervised learning is also applicable to sequential data (e.g., for handwriting, speech and gesture recognition).
In unsupervised learning, the model learns from input data without expected values, and the available dataset does not provide answers to the given task. Instead of labelling or predicting outputs, this algorithm focuses on grouping the data based on their characteristics.

Types of Neural Networks

Below is an overview of the most common types of neural networks currently in use.

Feedforward Neural Networks (FNNs)

FNNs, also called multi-layer perceptrons (MLPs), are characterized by a sequential flow of information that moves through neuron layers without relying on loops or cycles. They’re typically suitable for regression and classification tasks requiring sequential data processing. Neural networks where information is only fed forward from one layer to the next are called feedforward neural networks. The simplest kind of feedforward neural network (FNN) is a linear network, which consists of a single layer of output nodes with linear activation functions; the inputs are fed directly to the outputs via a series of weights. The sum of the products of the weights and the inputs is calculated at each node. The mean squared errors between these calculated outputs and the given target values are minimized by creating an adjustment to the weights. This technique has been known for over two centuries as the method of least squares or linear regression.

Convolutional Neural Networks (CNNs)

CNNs work with tasks using images, videos, and other grid-like data. They use convolutional layers to apply filters to input images. Those filters capture patterns and features, so you often see CNNs used in AI applications focused on image recognition, segmentation, and object detection. Kunihiko Fukushima's convolutional neural network (CNN) architecture of 1979 also introduced max pooling, a popular downsampling procedure for CNNs.

Recurrent Neural Networks (RNNs)

These neural networks introduce loops into a network architecture to maintain hidden states that persist information through different phases. RNNs process sequential data with a sense of memory. One origin of RNN was statistical mechanics. In 1972, Shun'ichi Amari proposed to modify the weights of an Ising model by Hebbian learning rule as a model of associative memory, adding in the component of learning. This was popularized as the Hopfield network by John Hopfield (1982). Another origin of RNN was neuroscience. The word "recurrent" is used to describe loop-like structures in anatomy.

Applications of Neural Networks

Neural networks are used for various tasks, including predictive modeling, adaptive control, and solving problems in artificial intelligence. They underpin breakthroughs in computer vision, natural language processing (NLP), speech recognition and countless real-world applications ranging from forecasting to facial recognition. State-of-the-art Neural Networks can have from millions to well over one billion parameters to adjust via back-propagation. They also require a large amount of training data to achieve high accuracy, meaning hundreds of thousands to millions of input samples will have to be run through both a forward and backward pass.

Image and Speech Recognition

One of the most popular uses of neural networks with AI is building processes to locate and recognize patterns and relationships in data. You see this at work in image and speech recognition applications.

Natural Language Processing

Neural networks revolutionized natural language processing (NLP) by enabling models to understand and generate human language. GPT and BERT are examples of AI applications that use neural networks in that way.

Autonomous Systems

Autonomous systems like self-driving cars and drones use neural networks to make decisions, use perception and control the vehicle.

Other Applications

Other AI applications that rely on neural networks include:

Medical imaging machines
Algorithms for trading
Content recommendation systems
Gaming non-playable characters (NPCs)
Equipment monitoring
Social media content moderation

The Resurgence of Neural Networks: Deep Learning

The recent resurgence in neural networks - the deep-learning revolution - comes courtesy of the computer-game industry. The complex imagery and rapid pace of today’s video games require hardware that can keep up, and the result has been the graphics processing unit (GPU), which packs thousands of relatively simple processing cores on a single chip. It didn’t take long for researchers to realize that the architecture of a GPU is remarkably like that of a neural net.

Modern GPUs enabled the one-layer networks of the 1960s and the two- to three-layer networks of the 1980s to blossom into the 10-, 15-, even 50-layer networks of today. That’s what the “deep” in “deep learning” refers to - the depth of the network’s layers. And currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research.

Addressing the Opacity of Neural Networks

The networks’ opacity is still unsettling to theorists, but there’s headway on that front, too. In addition to directing the Center for Brains, Minds, and Machines (CBMM), Poggio leads the center’s research program in Theoretical Frameworks for Intelligence. Recently, Poggio and his CBMM colleagues have released a three-part theoretical study of neural networks. The first part addresses the range of computations that deep-learning networks can execute and when deep networks offer advantages over shallower ones.

tags: #artificial #neural #network #in #machine #learning