Quasi-Newton Method

Machine Learning Artificial Intelligence Digital Transformation Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog

Overview of Quasi-Newton Method

The Quasi-Newton Method (Quasi-Newton Method) is one of the iterative methods for solving nonlinear optimization problems. This algorithm is a generalization of the Newton method, which searches for the minimum of the objective function without computing the higher derivatives (Hesse matrix). The quasi-Newton method is relatively easy to implement because it uses an approximation of the Hesse matrix and does not require an exact calculation of the Hesse matrix. See also “Hesse Matrix and Regularity” for more information on the Hesse matrix.

The main features and procedures of the quasi-Newton method are described below.

1. Approximation of Hesse Matrix:

The quasi-Newton method uses an approximation of the Hesse matrix. Instead of computing the Hesse matrix exactly, an initial approximate Hesse matrix (usually a unitary matrix) is used, and the approximate Hesse matrix is updated at each iteration.

2. search direction computation:

The approximate Hesse matrix is used to compute the search direction for minimizing the objective function in the next iteration step. Typically, this search direction is computed using the gradient vector and the inverse of the approximate Hesse matrix.

3. step size determination:

To determine the step size, a line search or projection method is usually used. The step size adjusts how far in the search direction to minimize the objective function.

4. convergence decision:

At each iteration, determine if the algorithm has converged. Usually, convergence is determined based on the change in the objective function or the norm of the gradient vector.

The main advantage of the quasi-Newton method is that it does not require the computation of higher-order derivatives, which makes it relatively efficient for problems with complex objective functions or high dimensionality. Also, by setting the initial approximate Hesse matrix appropriately, fast convergence can be expected. However, the quasi-Newton method may not guarantee a globally optimal solution and may converge to a locally optimal solution for some problems. Therefore, it is sometimes used in combination with the multi-start method or other algorithms.

Algorithms used in the quasi-Newton method

The quasi-Newton method becomes an algorithm for solving nonlinear optimization problems using an approximation of the Hesse matrix. The following two major algorithms are widely used

1. the DFP method (Davidon-Fletcher-Powell method):

The DFP method is an algorithm specialized to approximate the inverse of the Hesse matrix; the basic idea of the DFP method is to update the approximate Hesse matrix, which is used to compute the search direction; the DFP method starts with the initial approximate Hesse matrix as a positive definite matrix and updates this matrix at each iteration. The DFP method is guaranteed to converge when the Hesse matrix is positive definite, making it an effective method in many situations; see “About the DFP Method (Davidon-Fletcher-Powell Method)” for more information on the DFP method.

2. BFGS (Broyden-Fletcher-Goldfarb-Shanno method):

The BFGS method approximates the inverse of the Hesse matrix in the same way as the DFP method, but uses a different update rule: the BFGS method starts with a unit matrix as the initial approximation Hesse matrix and updates the matrix at each iteration. Like the DFP method, the BFGS method is an effective algorithm for many problems. See “Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method. for details.

These algorithms use an approximation of the inverse of the Hesse matrix to compute the minimization step of the objective function at each iteration. Both algorithms update the approximate matrix based on the update rules of the inverse of the Hesse matrix, converge efficiently, and the DFP and BFGS methods are suitable for many problems in nonlinear optimization and are primarily used as part of quasi-Newtonian methods. Which of the alternatives is used depends on the specific problem, and it will be common to try both algorithms.

Application of the Quasi-Newton Method

Quasi-Newton methods are widely applied to nonlinear optimization problems and are used in a variety of fields. Some of the applications of quasi-Newton methods are described below.

1. machine learning:

In training machine learning algorithms, quasi-Newton methods are used to adjust model parameters. It is suitable for tuning the parameters of models such as logistic regression, support vector machines, and neural networks.

2. statistical modeling:

In statistical modeling, quasi-Newton methods are used to optimize the parameters of models by maximum likelihood estimation and least squares methods. They have been applied to parameter estimation of linear and nonlinear models.

3. image processing:

In image processing, quasi-Newton methods are used to optimize the parameters of image processing algorithms for problems such as image restoration, filtering, and feature selection.

4. optimal control:

In optimal control problems, quasi-Newton methods are used to optimize the control inputs of the system. They are used in robotics, automotive control, aerospace control, etc.

5. financial engineering:

In financial engineering, quasi-Newton methods are applied to problems such as option pricing described in “Overview of Option Pricing and Examples of Algorithms and Implementations“, risk management, and portfolio optimization.

6. structural mechanics:

In structural mechanics simulations, quasi-Newtonian methods are used for building design, material strength optimization, and finite element analysis.

7. signal processing:

In signal processing, quasi-Newton methods are used to optimize the parameters of signal processing algorithms for problems such as filter design, signal restoration, and spectral estimation.

These are just some of the applications of quasi-Newton methods, which are used in many different areas dealing with nonlinear optimization problems. Quasi-Newton methods are suitable for a wide range of problems because they efficiently minimize the objective function while avoiding the computation of higher derivatives.

Example implementation of the quasi-Newton method

To demonstrate an example implementation of the quasi-Newton method, sample code for solving a nonlinear optimization problem using Python is shown. The following code implements the quasi-Newton method using the scipy.optimize module of the SciPy library.

import numpy as np
from scipy.optimize import minimize

# Define the objective function (using the Rosenbrock function as an example)
def rosenbrock(x):
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

# Initial Solution Setting
initial_guess = np.array([0.5, 0.5])

# Perform optimization using quasi-Newton method
result = minimize(rosenbrock, initial_guess, method='BFGS')

# Output Results
print("optimal solution:", result.x)
print("optimum value:", result.fun)

This code calls the quasi-Newton method (BFGS method) using the scipy.optimize.minimize function. The quasi-Newton method is used by setting the objective function and initial solution and setting the method parameter to ‘BFGS’.

The above code shows the problem of minimizing the Rosenbrock function (or Rosenbrock function), but the quasi-Newton method can be applied to different objective functions and constraints.

Challenges of Quasi-Newtonian Methods

Although quasi-Newton methods are very useful for nonlinear optimization problems, several challenges exist. The following are the main challenges of the quasi-Newton method.

1. selection of initial approximate Hesse matrix:

The quasi-Newton method requires an initial approximate Hesse matrix. The choice of this initial matrix has a significant impact on the convergence of the algorithm, and choosing an inappropriate initial matrix can slow convergence or increase the likelihood of convergence to a locally optimal solution.

2. handling of constraints:

The quasi-Newton method is not directly applicable to optimization problems with constraints. Constraint handling is required and must be combined with a constraint optimization algorithm.

3. numerical higher derivative computation:

Quasi-Newton methods use gradient information and approximate Hesse matrices to avoid computing higher derivatives. When higher derivatives can be computed, convergence can be faster than with other algorithms.

4. convergence to a locally optimal solution:

Since the quasi-Newton method does not guarantee a globally optimal solution, it may converge to a locally optimal solution depending on the choice of initial solution and approximate Hesse matrix.

5. computational cost:

The update of the approximate Hesse matrix and the computational cost incurred make it computationally expensive for large problems.

6. slow convergence:

Quasi-Newton methods are generally not fast converging algorithms. In particular, convergence can be slow for problems with strong nonlinearities.

To cope with these issues, the selection of an appropriate initial approximate Hesse matrix, a method for handling constraints, a numerical method for computing higher derivatives, and a convergence decision strategy are needed. In addition, multi-start methods from different initial solutions may be considered to reduce the risk of convergence to a locally optimal solution. While quasi-Newtonian methods are useful in many cases, care must be taken to address the challenges.

Addressing the Challenges of Quasi-Newtonian Methods

The following methods and measures are considered to address the challenges of the quasi-Newton method

1. selection of initial approximate Hesse matrix:

The choice of the initial approximate Hesse matrix is important, and choosing an inappropriate initial matrix will reduce the convergence of the algorithm. A common strategy is to start with the identity matrix as the initial matrix, but other choices can be considered depending on the problem.

2. constraint handling:

With constraints, the algorithm can be used in conjunction with a constraint optimization algorithm. Incorporating constraints into quasi-Newton methods can be very complex, so it is common to use algorithms that match the specific constraints.

3. numerical higher derivative computation:

If higher derivatives can be computed accurately, convergence can be improved. This is especially appropriate when the objective function is smooth and higher derivatives are useful.

4. risk reduction of convergence to a locally optimal solution:

Using a multi-start method to perform multiple iterations from different initial solutions and selecting the best solution reduces the risk of convergence to a locally optimal solution.

5. customization and tuning:

The parameters and convergence criteria of the quasi-Newton method should be appropriately adjusted and customized for the specific problem. Experimentation and testing of parameter choices will be useful.

6. integration of active set method:

When constrained conditions are present, the active set method may be used in combination with the active set method. Active set methods are specialized for optimization problems with constraints and are capable of handling constraints effectively.

7. search for the global optimal solution:

The use of quasi-Newton methods in combination with other algorithms to find the global optimal solution is being considered. It is expected to be effective when used in combination with global optimization methods such as genetic algorithms described in “Overview of genetic algorithms, application examples, and implementation examples” and particle swarm optimization.

Reference Information and Reference Books

For more information on optimization in machine learning, see also “Optimization for the First Time Reading Notes” “Sequential Optimization for Machine Learning” “Statistical Learning Theory” “Stochastic Optimization” etc.

Reference books include Optimization for Machine Learning

“Machine Learning, Optimization, and Data Science“

“Linear Algebra and Optimization for Machine Learning: A Textbook“

For beginner to intermediate students.
‘Numerical Optimisation’.
Authors: Jorge Nocedal and Stephen J. Wright
Publisher: Springer
Quasi-Newtonian methods (in particular the BFGS and L-BFGS methods) are treated in detail. It covers a wide range of topics, from mathematical foundations to algorithm implementation, and is highly regarded as a classic book on optimisation theory.
‘An Introduction to Optimisation’.
Authors: Edwin K. P. Chong and Stanislaw H. Zak
Publisher: Wiley
A general introduction to optimisation, with a concise explanation of quasi-Newtonian methods. Ideal for students of applied mathematics and engineering.
‘Practical Methods of Optimisation’.
Author: R. Fletcher
Publisher: Wiley
This book provides a good balance of theory and practice of quasi-Newtonian methods, with a particularly in-depth look at the mathematical background of quasi-Newtonian methods.

Theory-focused books.
‘Optimisation by Vector Space Methods’.
Author: David G. Luenberger
Publisher: Wiley
Deals with optimisation methods based on vector spaces and is suitable for an in-depth understanding of the mathematical theory behind quasi-Newtonian methods.
‘Convex Optimisation’.
Authors: Stephen Boyd and Lieven Vandenberghe
Publisher: Cambridge University Press
Specialises in convex optimisation, but provides important insights into quasi-Newtonian methods and related techniques.

For application and implementation.
‘Python for Data Analysis’.
Author: Wes McKinney
Publisher: O’Reilly Media
Describes libraries (such as SciPy) that can help in implementing numerical optimisation, including quasi-Newton methods, in Python.
‘Introduction to Algorithms’.
Authors: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
Publisher: MIT Press
A comprehensive book on algorithms in general, touching on the design and analysis of quasi-Newton methods and related optimisation techniques.
‘Optimisation Algorithms on Matrix Manifolds’.
Authors: P.-A. Absil, R. Mahony, R. Sepulchre
Publisher: Princeton University Press
Learn about variants of quasi-Newton methods and their applications in problems with matrix structures.

Practical applications.
‘Data Science from Scratch’.
Author: Joel Grus
Publisher: O’Reilly Media
Describes practical applications of optimisation techniques, including quasi-Newtonian methods, in the field of data science.
‘Numerical Recipes: the Art of Scientific Computing’.
Authors: William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery
Publisher: Cambridge University Press
This best-selling book on scientific computing includes examples of implementations and applications of quasi-Newtonian methods.

Online resources
SciPy Optimisation Documentation
Learn how to use quasi-Newton methods (BFGS and L-BFGS) in Python with implementation examples.
MOSEK Optimisation Toolbox
Commercial implementations of optimisation tools, including quasi-Newton methods, are provided.