Geometric approach to data

Artificial Intelligence Digital Transformation Machine Learning Programming Navigation of this blog Mathematics in Machine Learning Problem Solving Life tips & Miscellaneous Statistical causal inference and search

Geometric approach to data

There are various types of affroaches that treat information geometrically. One is called soft geometry, which deals with the phase of information and includes topological data analysis. The other, called hard information geometry, deals with differential geometrical studies of statistical models with probability distributions as elements, and includes approaches such as Riemannian geometry, symplectic geometry, and complex geometry.

Applications of information geometry extend not only to statistical inference such as EM algorithms, but also to statistical physics, learning theory, and information thermodynamics, and further development of quantum information geometry, Wasserstein geometry, and Rupiner geometry is also expected. In addition, information geometry will be something that is beginning to be applied in the field of artificial intelligence, in the interpretation of information in neural nets and neural firing patterns, and in the academic area connecting superstring theory and quantum information.

The following is a summary of articles on information geometry in this blog.

geometry

Fundamentals of Computer Mathematics

Mathematics is at the root of computer science. For example, machine learning, which is used in deep learning and natural language processing, starts with functions and uses differential/integral optimization calculations, while symbolic approaches used in artificial intelligence are based on set theory to evaluate expressions. Before considering their digital transformation applications and IT system applications, it is important to organize our knowledge of the basic elements of each.

Introduction to Mathematics” by Hiroyuki Kojima is a literal introduction to mathematics, starting with the Pythagorean Theorem and covering geometry, functions that often appear in the world of machine learning, differentiation, algebra, integration, and finally sets, which are the foundation of basic mathematics.

Exploring the Origins of Geometry The World of Non-Euclidean Geometry

“Soft” geometry

Topological Handling of Data Using Topological Data Analysis

In this article, we will discuss Topological Data Analysis.

Topological data analysis is a method of analyzing a set of data using a “soft” geometry called topology. Machine learning is an operation to find a model that fits the data well given the data, and a model is a space expressed in terms of some parameters. Given that a model is a space represented by some parameters, the essence of machine learning is to find a projection (function) from the data points to the space of the model.

Topology, on the other hand, is like the oft-used coffee cup and doughnut example. Suppose a coffee cup is made of an unbreakable clay-like material, and if we deform it little by little, we can eventually transform it into a doughnut.

information geometry

What is information geometry?

Information Geometry of Positive Definite Matrices (1) Introduction of Dual Geometric Structures

Information geometry first appeared in statistics. By geometrically considering the space of probability distributions, it added new views and insights to conventional mathematical statistics. For example, the way the space of probability distributions in question bends (curvature) is related to the performance of the parameter estimator. This is a beautiful result that is unique to geometry.

These results are derived from some dual structure of the space of probability distributions. Differences and horses are used in a variety of ways, and here “dual structure” is the image of a structure in which something can be turned over twice to get back to its original state, or two things cooperate to support something. Such “information geometrical structures” equipped with duality are not unique to the space of probability distributions. For example, “duality” in optimization, or “free energy” and “entropy” in thermodynamics can be regarded as examples. Thus, information geometry extends beyond the framework of statistical science and finds applications in various fields.

Information Geometry of Positive Definite Matrices (2) From Gaussian Graphical Model to Convex Optimization

In this article, I will introduce the Gaussian graphical model, a basic model in causal inference, and describe the information geometry of positive definite symmetric matrices that appears in the model.

Information Geometry of Positive Definite Matrices (3) Computation Procedure and Curvature

Statistics is well known as one of the fields where information geometry plays an active role. In this article, I will discuss semi-positive definite programming problems and their information geometry as a typical example of information geometry appearing in another field. In particular, I will focus on the fact that the number of iterations of the interior point method, which is the main solution method of semi-positive definite programming problems, can be expressed as an integral quantity of curvature in terms of information geometry.

Calculating the Surrounding Probability Distribution – Bethe Approximation

In this section, the algorithm of the probability propagation method is applied to the non-tree case to compute the approximate surrounding probability distribution. This can be understood as the Bethe approximation in terms of the variational method.

In the previous section, we discussed the algorithm of the probability propagation method on trees. This is. The peripheral probability distribution will be computed efficiently using message propagation. On a graph with cycles, the same algorithm can be applied as in Algorithm 1 to perform approximate computation.

Kernel Functions

SVM and related derived methods relied only on the inner product of the feature vectors as input. Due to this property, complex models can be realized without explicitly computing Φ(x) by replacing the inner product of the feature vector Φ(x) mapped to the feature space F with a kernel function K(xi,xj)=Φ(xi)TΦ(xj). However, not just any function can be used as a kernel function.

In this article, we will discuss what kind of functions can be used as kernel functions and what kind of operations can be performed on kernel functions. Kernel functions are also closely related to regularization in learning.

The kernel functions discussed in this paper include general kernel functions (linear kernel, polynomial kernel, RBF kernel) and kernel functions for probabilistic data, string data, and graph-type data (p-spectrum kernel, all-quantile kernel, gap weighted kernel, Fisher kernel, graph Laplacian, commute-time kernel, diffusion kernel, regularized Laplacian, and random walk).

Online Convex Optimization (2) Complementing the FTL Strategy with Regularization

In the previous article, we discussed the effectiveness and limitations of the FTL strategy, and the lesson learned is that a simple and greedy forecasting strategy may not work. In this article, we will discuss strategies that complement the weaknesses of the FTL strategy.

One of the most important concepts in machine learning is regularization. Many successful machine learning methods, such as support vector machines, do not simply learn hypotheses that minimize empirical losses (e.g., training errors), but rather learn hypotheses that simultaneously minimize empirical losses and some function (called a regularization term). The two-fold approach can be seen as preparing for future losses that may appear in the future, rather than being limited to past empirical losses, by taking into account the regularization term. Indeed, in the field of statistical learning theory, it is possible to evaluate the generalization error of regularization-based methods under reasonable assumptions.

Something similar holds in the context of online forecasting. What we will discuss here is called Follow The Regularized Leader strategy (FTRL strategy), which incorporates the idea of regularization into the FTL strategy.

Tensor

Dynamic Module Detection Using Tensor Decomposition Method

Tensor decomposition (TD) is a method for approximating high-dimensional tensor data to low-rank tensors. This technique is used for data dimensionality reduction and feature extraction and is a useful approach in a variety of machine learning and data analysis applications. The application of tensor decomposition methods to dynamic module detection is relevant to tasks such as time series data and dynamic data module detection.

practical use

Physics and Information Geometry
Artificial Intelligence and Information Geometry
Quantum Information and Information Geometry