Foundations of Artificial Intelligence, Distributed representation of features by Hinton’s paper, the founder of deep learning.

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Image Processing Reinforcement Learning Probabilistic Generative Modeling Deep Learning Navigation of this blog

Where do feature quantities come from?

From “Artificial Intelligence: From the Mysteries of the Mind to the Science of the Mind” by Jeffrey Hinton, published by Iwanami Shoten (Cognitive Science 38:1078-1101).

There are two major ways in which the human brain can represent various objects and events in the external world (shape of objects, arrangement of scenes, meaning of words and sentences, etc.) as spatio-temporal patterns of neural activity.

One is to prepare a large pool of neurons in which there is one neuron for each object, and try to represent each object by activating one neuron in the pool. The other is to represent each entity by the activity of a number of neurons, each of which is involved in the representation of a number of entities (distributed representation).

In the former approach, there are only about 10 billion cells in the cerebral cortex, and even if all the cells were used, it would clearly be insufficient to represent the entire universe. On the other hand, in the latter approach, even if the cells take only binary values, they can represent up to 10 billion powers of 2. In the latter approach, even if a cell takes only binary values, it can represent up to 10 billion squares of 2, and it is reasonable to assume that the neurons have a “distributed representation.

If the information in the human brain is made up of distributed representations, where do these “features” come from? The simplest hypothesis is DNA. The simplest hypothesis is the “innate specification of features” hypothesis, which says that features are imprinted on us from birth in our DNA. This hypothesis is based on the fact that the 1012 bits of information that a synapse may have (assuming that there are 1014 synapses and 1% of them are used for feature recognition and the cell takes only two values) cannot fit into the information that DNA can hold, and that the speed at which the world changes is not limited to the information that is innately specified. This hypothesis is rejected because the speed at which the world changes is not fast enough to respond to it with the information specified.

Next, if we consider the hypothesis that “features are acquired by learning,” it is important to consider the mechanism by which this learning takes place. Hinton considered a deterministic forward propagation network using the then state-of-the-art inverse error propagation method. Since this learning method requires a large amount of labeled data, he concluded that it was not valid because he could not think of a mechanism by which these data could be naturally given.

Next, Hinton considered a means of attaching features (labels to features) to the output of a neural net, which is a reconstruction of the whole or part of the input. In the case of static data, this is called DeepAutencode. In the case of dynamic data, we studied the adaptation of algorithms using probability distributions, called generative models. This model is called the undirected graphical model (or Markov random field), which was considered in the world of statistical physics and quantum mechanics at the time, and is also known as the calculation of the magnetic spin inging model of matter.

The algorithm for this model is called a Boltzmann machine, after the physicist Boltzmann. It estimates the connections between specific nodes by calculating the probabilistic parameters of a complete graph (a graph in which all nodes are connected), as shown on the left side of the figure below.

ボルツマンマシン例(1)*1)

ボルツマンマシン例(2)*1)

The weakness of this model is that it requires a huge amount of computation, and in large networks, computational explosion occurs, making it difficult to compute on current computers. Quantum computing is a hardware approach to compute these data as they are, and the model with strong constraints on the graph structure (two layers of hidden and visible variables, with no relations within each layer) is called the Restricted Boltzman Machine ( Restricted Boltzman Machine (RBM), and Deep Belief Network (DBN).

制約ボルツマンマシン*1)

ディープビリーフネットワーク*1)

Hinton’s paper discusses these RBMs and DBNs and states that by using these Boltzmann machines, “multiple stability (as seen in Necker’s cube illusion) and top-down effects in perceptual reasoning can be observed. I think this indicates the possibility of extracting complex features from simple information.

*1)AI-MASTER WIKI