Representation Theory Applications in Machine Learning

Thinking with representations and processing information is a productive approach to understanding and building machine learning models, even for creating new ones. This article aims to demonstrate that machine learning involves transforming information from one representational space to another, mirroring how brains and general intelligence function. This framework will be used to re-evaluate existing concepts and provide a fresh perspective.

Representation, Evaluation, and Optimization in Machine Learning

Machine learning can be viewed as a combination of representation, evaluation, and optimization.

Representation: How a model perceives data, which is the mathematical space where information resides (e.g., encoding).
Evaluation: How the model measures its performance (e.g., loss functions).
Optimization: The strategy used to find better solutions (e.g., gradient descent).

This article will focus on representations as a crucial part of this trio.

The Importance of Feature Engineering and Representation Learning

Before deep learning became mainstream, feature engineering was critical for successful machine learning models. Regardless of the data, models perform better when features are more processable and highlight essential aspects.

Today's deep learning models achieve success by introducing multiple layers, each aiming to improve the representation of data for problem-solving. This is known as representation learning. Ultimately, learning involves creating better data representations for solving problems. While some theories suggest learning is about compressing data, transforming data into a more usable space is equally vital, aided by compressing data by extracting essential parts.

Representations in Real Life

Representing information differently impacts both machines and humans. The way something is said is as important as what is being said. For example, adding two numbers in Roman numerals (CCVII + MIX = ?) requires decoding them into a more understandable format, like the decimal system (207 + 1009), which simplifies the addition process.

Personal perspectives influence how we represent things in our brains. An art major and an average person would perceive different aspects of the same painting, leading to different information processing.

Representations in Computers: Data Structures

Consider a sequence of numbers. Storing them in an array facilitates random access (e.g., accessing the 42nd number). However, if frequent element removal is needed while maintaining order, a linked list might be a better representation. The choice depends on how the information is presented and what operations are prioritized.

Machine Learning Examples of Representational Thinking

Several machine-learning methods highlight the importance of how information is represented and how new representations are calculated.

Kernels

Kernels transform the representational space. They were widely used with Support Vector Machines (SVMs) because SVMs aim to separate data linearly. If data cannot be separated linearly in its current space, kernels transform it into a space where linear separation is possible. Space transformation using kernels adds another dimension so that it helps to separate them with a plane. Deep learning performs a similar function, with each layer transforming the space into a more useful one.

Read also: A Guide to CPC

Autoencoders

Autoencoders learn better representations, even without sufficient labeled data. They transform information so that it can be reconstructed accurately. By forcing data through a smaller representational space, autoencoders squeeze and summarize the data meaningfully. This compressed representation improves performance in various tasks.

Word Embeddings

Word embeddings, similar to autoencoders, create representations to learn a word's relation with its neighbors, using methods like Continuous Bag of Words and Skip-gram. An alternative is the bag-of-words representation, which represents a document by the count of each word in the document. Word embeddings are smaller and denser, representing semantic relations, making them a better representational space for processing word meanings rather than the words themselves.

Convolutional Neural Networks (CNNs)

CNNs learn a new representation of an image at each layer. Early layers interpret borders and pattern changes, while later layers represent more complex, high-level patterns, such as the presence of an eye in a specific location when detecting human faces. This creates a superior representational space compared to pixel intensity, enabling computer vision applications.

A fully connected network that connects every pixel of the image to every node in a network and builds a multi-layered network theoretically has the capacity to do the same processing. However, in practice, they don’t work nearly as well. One obvious reason is that CNNs use data more efficiently by learning fewer parameters.

Transfer Learning

Transfer learning leverages pre-trained models for new tasks. A model trained on large datasets like ImageNet or COCO for general object detection can be adapted to detect new objects using less data. By removing the prediction head and using the pre-trained model to generate a smaller, useful representation of an image, the learned properties of the image can be leveraged. This reduces the effort needed to warp the space for a better representation of the new problem.

Read also: Inductive Representation Learning

Unifying Theory of Unsupervised Learning

The human mind makes sense of raw information without being taught how to see or hear. A unifying theory describes how algorithms can learn and discover structure in complex systems, like natural images, audio, language, and video - without human input. This class of algorithms can extend our understanding of the world by helping us see previously unseen patterns in nature and science. At the core of this thesis’ unified theory is the notion that relationships between deep network representations hold the key to discovering the structure of the world without human input. This principle can be applied in action, discovering hidden connections that span cultures and millennia in the visual arts, discovering visual objects in large image corpora, classifying every pixel of our visual world, and rediscovering the meaning of words from raw audio, all without human labels.

Relationships between deep features can rediscover the semantic structure of the natural world by connecting model explainability, cooperative game theory, and deep feature relationships. Relationships between representations can be used to unify over 20 common machine learning algorithms spanning 100 years of progress in the field of machine learning. In particular, a single equation unifies classification, regression, large language modeling, dimensionality reduction, clustering, contrastive learning, and spectral methods. This unified equation is the basis for a “periodic table of representation learning” that predicts the existence of new types of algorithms. One of these predicted algorithms is a state-of-the-art unsupervised image classification technique.

Representation Theory and Neural Networks

Neural networks can be represented via the mathematical theory of quiver representations. A neural network is a quiver representation with activation functions, a mathematical object that we represent using a network quiver. Network quivers gently adapt to common neural network concepts such as fully-connected layers, convolution operations, residual connections, batch normalization, pooling operations, and even randomly wired neural networks. This mathematical representation exactly matches reality and can be studied with algebraic methods. A quiver representation model is used to understand how a neural network creates representations from the data. A neural network saves the data as quiver representations and maps it to a geometrical space called the moduli space, which is given in terms of the underlying oriented graph of the network, i.e., its quiver. This results as a consequence of our defined objects and understanding how the neural network computes a prediction in a combinatorial and algebraic way.

Representation Theory: A Broader Perspective

Representation theory is a branch of mathematics focused on studying abstract algebraic structures by representing their elements as linear transformations of vector spaces. It is a fundamental tool for understanding symmetry and transformational properties in diverse phenomena across different disciplines.

Core Concepts

Central to representation theory is the notion of irreducibility, which involves decomposing representations into their simplest, non-trivial components. Irreducible representations are those without non-trivial invariant subspaces under the action of the group. These irreducible representations are akin to prime numbers in integer factorization.

Applications

Representation theory has diverse applications across various fields:

Physics: Describing elementary particles and understanding quantum mechanics.
Chemistry: Analyzing molecular symmetry and spectroscopy. Group representations help predict molecular properties and guide the design of new materials and compounds.
Computer Science and Cryptography: Designing error detection and correction codes with desirable properties.
Mathematics: Unifying seemingly disparate areas of mathematics, connecting algebra, geometry, analysis, and combinatorics. Lie groups and Lie algebras, which describe continuous symmetries and their infinitesimal behavior, are studied using representations on vector spaces, shedding light on geometric and analytic properties.

Current Research and Open Questions

Current research focuses on computing invariants of representations and extracting meaningful information from representations with large dimensions. Key areas include:

Understanding the representation theory of the general linear group over finite fields.
Classifying wild quivers and their nontrivial representations.

These questions pose theoretical and technical challenges, driving further developments in representation theory, harmonic analysis, and mathematical physics. Collaboration between mathematicians, physicists, chemists, and computer scientists is essential to uncover hidden connections and explore new avenues in quantum computing, coding theory, and machine learning.

The application of representation theory to machine learning and artificial intelligence is an area of active research, aiming to develop robust and interpretable machine learning algorithms by understanding symmetry and transformation in diverse contexts.

tags: #representation #theory #applications #machine #learning