Submodular Optimization and Machine Learning

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Machine Learning in General IOT Navigation of this blog

Overview of Machine Learning with Submodular Optimization

Submodular functions are a concept corresponding to convex functions on discrete variables, and are an important concept in combinatorial optimization problems. By “combinatorial” we mean the procedure of “selecting a part of some selectable collection” and various computational properties associated with it. A submodular function is a function in a combinatorial optimization problem that has the following properties

  • Diminishing Returns: A submodular function has the property that the value of the function decreases as one element is added. That is, the more elements are selected, the smaller the value of the function.
  • Submodularity: Submodular functions have the property that the value of the function is reduced or unchanged by the operation of replacing one element by another.

Submodularity is used to find efficient solutions to optimization problems such as minimization and maximization, and has a wide range of applications in information theory, machine learning, economics, social sciences, and other fields.

Submodular functions are used as an important tool for finding efficient solutions in optimization and decision-making problems. Specific applications include the following

  • Social Network Analysis: In social network analysis, the underdetermined modular function is applied to problems such as propagating information and maximizing influence. For example, in order to determine the optimal advertising strategy for a certain product, it is used for the problem of selecting users to maximize the spread of information within a limited budget.
  • Image Segmentation: In image processing, the submodular function is applied to optimize image segmentation (dividing an image into multiple regions). For example, when an image is segmented into different regions, the problem is to find a segmentation that minimizes the boundaries between the regions.
  • Query Optimization: In information retrieval and database query optimization, submodular functions are applied to problems such as ranking search results and query optimization. For example, they are used to determine the order in which information is effectively presented in order to optimally reorder the ranking of search results.
  • Combinatorial Optimization: submodular functions are widely used in combinatorial optimization problems to find efficient solutions. These include as follows.
    • Minimum coverage problem: the problem of finding the smallest set of elements that satisfy certain conditions, such as finding the smallest set of sensors to efficiently cover a certain sensor network
    • Maximum weighted set coverage problem: the problem of obtaining the maximum sum of weights by selecting a certain weighted set of elements, such as the ad placement problem to obtain the maximum effect of a certain ad with the minimum budget.
    • Maximum spanning tree problem: network optimization and clustering of communication networks, power networks, social networks, transportation networks, etc.
    • Optimal scheduling: job scheduling, project scheduling, resource allocation, etc.

Common algorithms for optimizing submodular functions include the following

  • Greedy Method: The Greedy method can be a simple and effective algorithm for optimizing a submodular function. The Greedy method finds the overall optimal solution while making locally optimal choices at each step. Using the properties of the submodular function, the optimal solution can be efficiently obtained by selecting the element with the greatest effect at each step.
  • Naive Algorithm: The naive algorithm is the basic algorithm for optimizing a submodular function, which is a method to find the optimal solution by enumerating all possible solutions. It is computationally expensive because it does not take advantage of the properties of the submodular function and tries all combinations, but it is able to find the optimal solution.
  • Solver-based methods: Solver-based optimization methods are also commonly used for the problem of optimizing a submodular function. For example, optimization solvers such as integer linear programming (ILP) and mixed integer linear programming (MILP) provide powerful algorithms for optimizing submodular functions.

In this blog, the theory and specific applications of these submodular functions are presented below.

Implementation

Submodular optimization is a type of combinatorial optimization that solves the problem of maximizing or minimizing a submodular function, a function with specific properties. This section describes various algorithms, their applications, and their implementations for submodular optimization.

  • Overview of Submodular Diversification and examples of algorithms and implementations

Submodular Diversification is a method for selecting the top k items with diversity in information retrieval and ranking tasks. The basis of Submodular Diversification is the Submodular function, also described in “Submodular Optimisation and Machine Learning”, which is a set function \( f: 2^V \rightarrow \mathbb{R} \), with the following properties.

Theory

The theory of submodular functions and their optimization is a central topic in the field of combinatorial optimization.
The concept of a submodular function, corresponding to a convex function with respect to discrete variables, begins with a 1970 paper by Edmonds.

Submodular optimization becomes an approach to systematically extracting a subset of data that may be useful, and the computation of selecting a subset from such a collection of selectable objects can be viewed as optimizing a discrete function called a “set function” (set function).

Submodular functions have both good properties when considering optimization and a wide range of applicability when considering applications. In this section, we discuss the concept of submodular optimization and introduce some basic concepts that are important in submodular optimization. In addition, we discuss the relationship between submodular functions and convex functions.

Convex functions and concave functions are concepts that play an important role in optimization when the function takes continuous values.

Here, there are three specific submodular functions: (1) the cover function, (2) the cut function of a Shufuyo graph, and (3) the function generated by a concave function. We will discuss them.

In submodular optimization, it is very important to know what properties, in addition to submodularity, the function f currently handled has. Depending on the properties of the function f, the computational efficiency of the exact optimization and the accuracy that can be guaranteed by the approximation algorithm may differ greatly. In this section, we will discuss the basic concepts that are important when categorizing submodular functions and define related concepts, such as dominant and modulo functions.

The three types are (1) normalized, (2) nonnegative, and (3) symmetric, and the algorithms and computation times for the case of minimizing and maximizing directed and undirected graph cuts are described.

In a normalized submodular function, two polyhedra are defined by the function, a submodular polyhedron and a base polyhedron. These polyhedra play an important role in discussing the inferior moduli function minimization problem and the convexity of inferior moduli functions. In this section, we describe an algorithm for the inferior moduli function minimization problem that utilizes the minimum norm points of the base polyhedron.

The problem of optimizing (maximizing or minimizing) a linear function on a polyhedron is generally called a linear optimization problem or linear programming problem. A linear optimization problem with a bounded feasible domain and a bounded polyhedron always has an endpoint optimal solution or an endpoint optimal solution.

Also, if the L2-norm minimization problem on the base polyhedron can be solved, then the submodular function minimization problem can be solved. Compared to the polynomial-time algorithm, the minimum norm point algorithm, known as the fast inferior modular function minimization algorithm, is an algorithm based on this property.

On submodular functions and convexity, let V={1,…,n} and f:2V→ℝ be normalized modular functions. A certain equivalence holds for submodular functions and convexity, and this convexity also plays an important role in the design of optimization algorithms. In order to precisely describe the meaning of the fact that a submodular function can be regarded as a convex function on a discrete domain {0,1}n, it is necessary to introduce the concept of the Lovaz extension.

A related concept, multilinear extension, which is a continuation different from Lováz extension, is also often used. The domain of definition of the multilinear extension\(\tilde{f}:³,1]^n\rightarrow\mathbb{R}\) is the n-dimensional unit hypercube {0,1}n. While the Lovász extension \(\hat{f}\) is the notion of extending a submodular function f as a convex function, the multiple linear extension \(\tilde{f}\) is the notion of extending a submodular function f to be a “partially concave function”.

In this issue, we will discuss the problem of maximizing a submodular function. However, we will usually discuss the case where f:2V→ℝ is a monotone submodular function. The constraint condition also imposes that the number of elements in the subset to be selected is at most k(>0). As mentioned above, the inferior modulo function often represents some gain in various fields, and therefore, the formulation to maximize it can be found in many situations. Thus, the problem of maximizing a submodular function becomes very important in applications.

The problem of maximizing a submodular function has been applied to various problems in machine learning and other fields. Here we discuss document summarization, the sensor placement problem, and active learning as examples.

As an algorithm, we also describe the greedy method, a simple approximation algorithm.

As the next application example, consider the problem of arranging several sensors so that the observation error is as small as possible in a situation where some physical quantity (e.g., room temperature) is observed using sensors in a certain space. In the previous section, a similar situation was studied using a cover function, but here the problem is modeled by Gaussian process regression and finally leads to undermodular maximization.

As another example, we consider the formulation of active learning as a submodular function maximization. Active learning is a method for selecting and labeling only important samples when the cost of labeling is high in supervised learning, which is one of the key problems in machine learning.

In this section, we consider the active learning setting, which is generally referred to as pooled-based learning. Pool-based active learning deals with the problem of selecting samples to be labeled that are useful for learning, given a set of unlabeled samples at hand.

Submodular functions can be minimized efficiently. In particular, minimization of cut functions of directed graphs can be performed by the maximum flow algorithm, which is theoretically/practically fast. In this section, we discuss the basics of the maximum flow algorithm, and as an example of what is attributed to such a computation, we discuss inference over Markovian stochastic fields, and describe a class of fast computationally feasible submodular functions that are often used in practical applications.

Graph cutting is known to be an algorithm that can be applied to problems on the order of tens of thousands to millions in practical time, but in terms of the submodular functions, it can be interpreted as minimizing the cut function, which is roughly a second-order submodular function. It is known that a theoretically/practically fast algorithm called the maximum flow algorithm can be applied to the minimization of the cut function. Whether or not it can be described as such a minimization of the cut function is one way to measure whether or not the combinatorial computation can be performed in a practical manner.

In order to understand this relationship, we will first look at the relationship between the minimization of the cut function and the calculation of the maximum flow in a network, and then look at a concrete method for actually performing the maximum flow calculation.

In this issue, we will discuss the flow increment method, which is a typical algorithm for finding the maximum flow, and we will also discuss how to calculate the minimum s-t cut using the maximum flow algorithm and the proof of the maximum flow minimum cut theorem. We also describe the Bliflow-Bush method, a fast maximum flow algorithm based on a different principle from the flow incremental method.

When dealing with multivariate data, if each variable in the data has some structure, various inference algorithms may be described recursively using that structure. The Markov probability field is a typical model, which uses an undirected graph to represent Markovianity among variables. In this section, we discuss the relationship between the maximum-a-posteriori estimation (maximum-posteriori estimation) of a Markov random field and column-reduced modular optimization. Furthermore, we discuss how graph cuts, which are widely used in computer vision and other fields, are essentially minimizing s-t cuts.

As mentioned above, the s-t cut function of a directed graph is a special class of submodular functions that can be minimized quickly. However, of course, this function alone is not sufficient in various practical situations due to its limited descriptive power. Therefore, it would be extremely useful in practical situations if a function could be devised that is more descriptive than the s-t cut function for directed graphs and can be minimized as fast as this function. Here, we discuss graph-representable submodular functions as one answer to this question. Graph-representable submodular functions are functions that can be represented as s-t cut functions of directed graphs by adding some auxiliary vertices, and are a class of practically useful functions as described above.

Next, we discuss the pre-flow push method as a method to compute the maximum flow for minimizing the s-t cut function of a directed graph. As mentioned above, a computationally O(nm) algorithm by Orlin has recently been proposed for the maximum flow. In the case of the pre-flow push method, the computational complexity is O(mnlogn2/m), but it is still an important algorithm for applications because of its known fast implementation in practical use. The method can also be extended to parametric optimization, which will also play an important role in the following section.

The feasible flows were such flows ξ = (ξe)e∈ℰ that satisfy both the “capacity constraint” and the “flow conservation constraint”. In the preflow-push method, instead of the feasible flow, a flow called preflow, which is obtained by relaxing the flow conservation constraints, is considered, and the maximum flow is calculated by iteratively updating this preflow to gradually satisfy the flow conservation speed and finally obtaining the feasible flow.

Structural regularization learning is a framework for exploiting the structure among data variables in supervised learning. In this section, we describe the relationship between structural regularization and submodular functions and an algorithm for structural regularization through submodular optimization using this relationship.

When we have data at hand to which we want to apply machine learning methods, it is unlikely that each variable in the data is completely unrelated to each other. For example, when dealing with image data, the variables in the model correspond to pixels in the image, so it can be said that the adjacency structure of pixels exists as a relationship between variables. Also, when dealing with genetic data, each variable corresponds to a gene, so there are known gene interactions and so on. In learning, it is expected that taking these relationships into account will improve accuracy and produce models that are easy to interpret.

Here, we describe structural regularization, a machine learning method that is unaware of the structure among data variables. Since such structure is, so to speak, prior information in learning, it can be used effectively to reduce the real dimension of the problem and improve learnability. Therefore, structural regularization can be regarded as an approach to so-called sparse modeling.

Recently, structural regularization has attracted attention because of its close relationship with submodular optimization. Not only theoretically interesting, but also practically important is the fact that structural regularization learning can be attributed to network flows that can be computed extremely fast, which is an important aspect of this relationship.

We now discuss the structural sparsity obtained from the submodular functions. The aforementioned regularization by the 𝓵p norm is in the form of a uniform treatment for each variable ω1,…,ωd, as can be seen from this form. However, this one fairy is not essential in regularization. In this section, we describe structured regularization, a method that extends regularization and explicitly uses relationships among variables in the data.

There are two main types of sparsity obtained by regularization. The first type is sparsity in the form that many parameters are reduced to take the value of 0, as in the 𝓵1 regularization described above, where the unit of sparsity is each variable, but as described below, this can be extended to a group unit on the variables given in advance. The other type of sparsity is where multiple parameters with similar properties tend to have the same value. The former type is called group-type regularization, and the latter type is called coupling-type regularization.

Application of submodular function optimization, an optimization method for discrete information, to structural regularization problems and formulation using submodular optimization (linear approximation and steepest effect method, accelerated proximity gradient method, FISTA, parametric submodular minimization, partition algorithm)

Implementation

Various libraries have been developed for problems related to submodular functions.

A site that lists available libraries, with information on 23 libraries.

コメント

タイトルとURLをコピーしました