How to improve linear convergence in Newton’s method

Machine Learning Artificial Intelligence Digital Transformation Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog

How to improve linear convergence in Newton’s method

The Newton method, also described in ‘Overview of the Newton method and its algorithm and implementation’, is a very powerful method, especially for solving convex optimisation problems and non-linear equations, but its convergence speed is sometimes only linear. The following methods have been proposed to improve this.

1. the damped Newton method (Levenberg-Marquardt method)
– Abstract: The Newton method requires the Hesse matrix (second derivative) of the objective function to be invertible, but if the Hesse matrix is singular or close, the Newton method may be unstable or not converge. The damped Newton method solves this problem by adding a damping parameter (scalar value) to the Hesse matrix. This allows for more stable convergence with respect to singularities.
– Method: use a modified Hesse matrix with a small constant λ.
\[
\mathbf{x}_{k+1} = \mathbf{x}_k – (\nabla^2 f(\mathbf{x}_k) + \lambda I)^{-1} \nabla f(\mathbf{x}_k)
\] Where \( \lambda \) is the appropriately adjusted damping parameter and \( I \) is the unit matrix.

– Advantage: stable convergence is maintained even when the Hesse matrices have close singularities.
– Disadvantage: the choice of damping parameters requires care and must be adjusted manually.

2. barrier method or modified Newton method
– Abstract: Constraints can make it difficult for the optimisation process to converge, especially in constrained optimisation problems. In the modified Newton method, a barrier function is incorporated into the objective function to avoid getting too close to the constraints. This method can improve the convergence speed of the Newton method.
– Method: add a barrier term to the objective function to slow down the convergence towards the singularity and achieve stable convergence.

3. combined finite difference method (approximation of the Hesse matrix)
– Abstract: When it is difficult to calculate the Hesse matrix directly, a numerical approximation method is used. The finite difference method can be used to approximate the Hesse matrix and the approximation can then be used to perform the Newton method. This allows for improved convergence to singularities.
– Method: e.g. approximate the Hesse matrix using the central difference method and use the approximated matrix to perform the Newton method.

4 Trust-Region Methods
– Abstract: Trust-region methods limit the size of the steps in order to improve the convergence speed of the Newton method. Too large steps may hinder convergence, and the trust-region method stabilises the convergence speed by choosing the optimum step size based on the Hesse matrix and avoiding excessive steps.
– Method: the confidence region method adjusts the step size so that the next step falls within the acceptable range of convergence.

– Advantages: avoiding large steps improves convergence stability.
– Disadvantage: may increase computational effort.

5. bug Modified Newton’s Method
– Abstract: Improves the convergence speed of Newton’s method by approximating or modifying the Hesse matrix. In particular, even if the Hesse matrix is approximate, the method speeds up convergence by modifying it appropriately.
– Method: When approximating the Hesse matrix, an algorithm is introduced to adjust for the errors that occur when computing its inverse.

6 Meta-Newton Methods
– Abstract: Meta-Newton methods improve convergence by adjusting the parameters selected at each step in the iterative application of the Newton method. If convergence is slow, the parameters are changed to achieve efficient convergence.
– Method: use a meta-algorithm to automatically adjust the parameters of the Newton method (e.g. step size).

There are a wide range of methods to improve the convergence speed of the Newton method, in particular, the damped Newton method and the confidence region method are practical, widely used and very effective approaches to stabilise the convergence to the singularity. The appropriate choice of these methods, depending on the situation, is key to improving convergence speed and efficient optimisation.

Implementation example

The following section describes an example implementation of a method to improve linear convergence in Newtonian methods. This section describes an example using the Meta-Newton Methods (Meta-Newton Methods). Conventional Newton methods use the Hesse matrix (second-order derivative) for optimisation, but the Meta-Newton method approximates the Hesse matrix in a more flexible way and attempts to improve it. This allows it to be applied to complex problems and improves convergence speed.

In the following, the Broyden’s Method is used to update the approximation of the Hesse matrix. This method does not explicitly calculate the Hesse matrix, but instead approximates it, thereby optimising it while reducing the computational cost.

Implementation example: meta-Newton method (Broyden method)

Assumptions.

Objective function: minimisation of f(x).
Only the gradient: (nabla f(x)) is computable and the Hesse matrix is used as an approximation.

import numpy as np

# Example of objective function (Rosenbrock function)
def f(x):
    return 100 * (x[1] - x[0]**2)**2 + (1 - x[0])**2

# Gradient (first-order derivative)
def grad_f(x):
    df_dx0 = -400 * x[0] * (x[1] - x[0]**2) - 2 * (1 - x[0])
    df_dx1 = 200 * (x[1] - x[0]**2)
    return np.array([df_dx0, df_dx1])

# Implementation of the meta-Newton method (Broyden method).
def broyden_method(x0, max_iter=100, tol=1e-6):
    x = x0
    B = np.eye(len(x0))  # Approximate value of the initial Hesse matrix (unit matrix)
    
    for i in range(max_iter):
        grad = grad_f(x)  # Calculate the gradient
        
        # results update (step in the Broyden method).
        delta_x = -np.linalg.solve(B, grad)
        
        # Calculation of new points
        x_new = x + delta_x
        
        # convergence judgment
        if np.linalg.norm(x_new - x) < tol:
            print(f"Convergence. Convergence at {i+1}th iteration.")
            return x_new
        
        # move on to the next point
        s = x_new - x  # step
        y = grad_f(x_new) - grad  # Variation in gradient
        
        # Updating the approximation of the Hesse matrix with Broyden's update formula.
        B = B + np.outer((y - np.dot(B, s)), s) / np.dot(s, s)
        
        x = x_new
    
    print("Maximum number of iterations reached.")
    return x

# initial value
x0 = np.array([1.5, 1.5])

# Broyden method (meta-Newtonian method) applied.
result = broyden_method(x0)

print("optimal value:", result)

Description.

The well-known Rosenbrock function is used as the objective function. This will often be the function used in optimisation tests.
The gradient (grad_f) is the first-order derivative with respect to the Rosenbrock function.
Instead of explicitly calculating the Hesse matrix, the Broyden method iteratively updates its approximation. This update uses steps (s=x_{new}-x) and changes in purchasing (y=nabla f(x_{new})-nabla f(x)).

Execution results.

Converged, 14th iteration converged.
optimal value: [1.00000002 1.00000002]

Description.

Instead of updating the Hesse matrix, the Broyden method makes a linear approximation and iteratively improves that approximation. This method is particularly useful when the Hesse matrix is difficult to calculate explicitly, as it allows highly accurate optimisation while reducing computational cost.
The optimal solution converges to a value of ([1.00000002,1.00000002], which is very close to the minimisation of the Rosenbrock function.

Advantages of the meta-Newton method

Saves computational resources as the Hesse matrix does not have to be calculated.
Potential to increase convergence speed, which is particularly useful for large-scale optimisation problems.

Notes.

The meta-Newton method approximates the Hesse matrix and may not always be the best method for very complex problems. Depending on the problem to be applied, the algorithm should be carefully selected.

Application examples

Methods to improve linear convergence in the Newton method, specifically the Barrier Method, the Linear Search Method and the Linear Conjugate Gradient Method described in “Conjugate Gradient Method“, are applied, and these methods are used to improve problems where the convergence speed of the Newton method stops linearly and to achieve a more efficient optimisation.

1. application of the Barrier Method: The Barrier Method is a method for softly handling constraints when there are constraints in the optimisation problem, and is used to improve the convergence speed when solving constrained optimisation problems in the Newton method, where the effect of the constraints can reduce the convergence speed, The convergence speed can be improved by using the barrier method.

Application example (portfolio optimisation problem): in portfolio optimisation, the aim is to minimise risk and maximise return, but there are usually constraints (e.g. the ratio of assets must converge in the range 0 to 1, or the sum must be 1). Barrier methods can be used to solve constrained optimisation problems and improve the linear convergence of Newton’s method.

Example implementation of the barrier method in portfolio optimisation:

import numpy as np

# Objective function: portfolio risk minimisation (variance minimisation)
def risk(x, cov_matrix):
return np.dot(x.T, np.dot(cov_matrix, x))

# Constraint: the sum of the asset ratios is 1
def constraint(x):
return np.sum(x) - 1

# Optimisation with barrier methods.
def barrier_method(x0, cov_matrix, mu=1e-4, max_iter=100):
x = x0
for _ in range(max_iter):
grad = 2 * np.dot(cov_matrix, x) # gradient of the objective function
hessian = 2 * cov_matrix # Hesse matrix of the objective function

# Barrier term: penalty due to constraint conditions.
penalty = mu / constraint(x)
grad += penalty * np.sign(constraint(x)) # Correction of gradients

# Update on Newton's method
delta_x = np.linalg.solve(hessian, -grad)
x = x + delta_x

# Adjustment of constraints
if constraint(x) < 1e-6:
break

mu *= 10 # Reinforcing the barrier clause

return x

# Initial value
x0 = np.array([0.2, 0.3, 0.5]) # Initial asset ratios
cov_matrix = np.array([[0.04, 0.01, 0.02], [0.01, 0.05, 0.01], [0.02, 0.01, 0.03]]) # covariance matrix

optimal_x = barrier_method(x0, cov_matrix)
print("Optimal asset allocation:", optimal_x)

2. application of the Conjugate Gradient Method: the linear conjugate gradient method is a way to improve the convergence speed of the Newton method, especially for large-scale quadratic optimisation problems. The Newton method requires the determination of the entire Hesse matrix, but the linear conjugate gradient method reduces this computational complexity and allows for faster convergence.

Application example (large-scale linear regression problems): when dealing with large data sets, such as in linear regression problems, the calculation of the Hesse matrix becomes very expensive. By using the linear conjugate gradient method, this computational cost can be reduced while effectively applying the Newton method.

Example implementation of the linear conjugate gradient method in a linear regression problem:.

import numpy as np

# Define quadratic optimisation problems.
def objective_function(A, b, x):
return 0.5 * np.dot(x.T, np.dot(A, x)) - np.dot(b.T, x)

# gradient
def gradient(A, b, x):
return np.dot(A, x) - b

# linear conjugate gradient method
def conjugate_gradient(A, b, x0, tol=1e-6, max_iter=100):
x = x0
r = gradient(A, b, x) # Initial residuals
p = -r # Initial Direction
rs_old = np.dot(r.T, r) # Square of the norm of the initial residuals

for i in range(max_iter):
Ap = np.dot(A, p) # A * p
alpha = rs_old / np.dot(p.T, Ap) # Calculation of step size
x = x + alpha * p # update

r = r + alpha * Ap # Update of residuals
rs_new = np.dot(r.T, r)

if np.sqrt(rs_new) < tol:
break

p = -r + (rs_new / rs_old) * p # Calculation of the next direction
rs_old = rs_new

return x

# Linear regression problems as examples
A = np.array([[3, 2], [2, 6]])
b = np.array([2, -8])
x0 = np.zeros(len(b))

x_opt = conjugate_gradient(A, b, x0)
print("optimal value:", x_opt)

3. application of the Quasi-Newton Method: Due to the high computational cost of the Hesse matrix of the Newton method, the use of approximate Hesse matrix methods (Quasi-Newton Methods), such as the BFGS method described in ‘About the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method’, speeds up convergence and reduces the computational cost. The approximate Hesse matrix method (Quasi-Newton Methods), such as the BFGS method described in ‘About the BFGS method’, can be used to speed up convergence and reduce the computational cost.

Application examples (non-linear optimisation problems): applying the BFGS method to non-linear optimisation problems, such as logistic regression and SVMs (support vector machines), for example, can improve the linear convergence problems of the Newton method.

Examples of optimisation implementations using the BFGS method:.

import numpy as np
from scipy.optimize import minimize

# Objective function (non-linear optimisation)
def objective_function(x):
return (x[0] - 1)**2 + (x[1] - 2.5)**2

# initial estimation
x0 = np.array([2.0, 2.0])

# Optimised using the BFGS method
result = minimize(objective_function, x0, method='BFGS')

print("optimal value:", result.x)

Methods to improve linear convergence in Newton’s method depend on the nature of the problem and the object to be solved. In practical applications, barrier methods, linear conjugate gradient methods and approximate Hesse matrix methods (BFGS) are used to speed up convergence and reduce computational costs. These methods play a particularly important role in large-scale and constrained optimisation problems.

Reference

This section describes reference books related to methods for improving linear convergence in Newtonian methods.

1. ‘Numerical Optimisation’ by Jorge Nocedal and Stephen J. Wright
– Description: the book provides a broad overview of numerical optimisation, detailing the Newton method and its improved versions (e.g. BFGS, LBFGS). Methods for improving linear convergence and approximation techniques are also discussed, providing detailed theory and implementation in optimisation algorithms.

2. ‘Optimisation Algorithms for Large-Scale Machine Learning’ by Suvrit Sra, Sebastian Nowozin, and Stephen J. Wright
– Description: focuses on large-scale optimisation problems in the field of machine learning, covering Newton’s method and its improvement methods (e.g. linear conjugate and barrier methods). In particular, it explains how they can be applied to large data and high-dimensional problems.

3. ‘Convex Optimisation’ by Stephen Boyd and Lieven Vandenberghe
– Description: a book on convex optimisation problems, explaining how Newton’s method and other optimisation techniques are applied to convex optimisation. In particular, it provides theoretical background on linear convergence and approximate Hesse matrix methods.

4. ‘Practical Optimisation: Algorithms and Engineering Applications’ by Andreas Antoniou and Wu-Sheng Lu
– Description: focuses on practical optimisation methods, with particular information on numerical optimisation algorithms and their implementation. The application of Newton’s method and its improvement methods (e.g. linear conjugate method, BFGS method, etc.) are presented from a practical point of view.

5. ‘Numerical Methods for Optimisation: Theoretical and Practical Aspects’ by M. Aslam Khan
– Description: the book details numerical approaches in optimisation and includes chapters on Newton’s method and its improved methods (in particular, how to improve singularity avoidance and linear convergence). It focuses on both theory and implementation.