Rescaling of Newton’s method

Machine Learning Artificial Intelligence Digital Transformation Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog

Rescaling of Newton’s method

Rescaling in Newtonian methods is one of the methods used in numerical optimisation to improve convergence speed and avoid problems related to singularities and local optimum solutions, and rescaling is used in the calculation process of optimisation to improve convergence or to improve the stability of the calculation by appropriate scaling (rescaling) based on the step size or the nature of the Hesse matrix to improve convergence or to increase the stability of the calculation.

The Newton method, which is also described in ‘Overview of the Newton method, algorithm and implementation’, uses the following iterative formula to find the optimal solution.

\[
x_{k+1} = x_k – H(x_k)^{-1} \nabla f(x_k)
\]

Where:
– \(x_k\) is the current solution estimate.
– \(H(x_k)\) is the Hesse matrix (second-order derivative of the objective function), representing the curvature of the function around the solution.
– \(\nabla f(x_k)\) is the gradient of the objective function.

Rescaling is mainly carried out by.

1. normalising the Hesse matrix: if the Hesse matrix described in ‘On Hesse matrices and regularity’ is asymmetric or has very large eigenvalues, it is scaled by normalising it. This makes the iterative calculations more stable and converges faster.

For example, the Hesse matrix can be adjusted as follows.

\[
H_{\text{scaled}} = \alpha H(x_k)
\]

where \(\alpha\) is a tuning parameter, which adjusts the scaling of the Hesse matrix.

2. scaling of the step size: in the Newton method, if the step size is not appropriate, convergence will be slow or the iterations will diverge. It is possible to accelerate convergence by rescaling the step size, e.g. by adding adaptive scaling to the step size, such as.

\[
x_{k+1} = x_k – \lambda_k H(x_k)^{-1} \nabla f(x_k)
\]

where \(\lambda_k\) is a parameter that scales the step size, which is adjusted at each iteration to speed up convergence and avoid locally optimal solutions.

3. scaling adjustment: eigenvalue decomposition of the Hesse matrix is performed and rescaled to suppress components with large eigenvalues. Specifically, weights are applied to each component of the Hesse matrix in order to keep the curvature of the function uniform, which improves the condition number of the matrix and stabilises the numerical calculation.

4. damping: In the Newton method, when calculating the inverse of the Hesse matrix at each iteration, damping can be applied to stabilise convergence. This is a method of scaling the inverse of the Hesse matrix by adding a slightly smaller constant （\(\delta\)） to it.

\[
H_{\text{damped}} = H(x_k) + \delta I
\]

where \(I\) is the unit matrix and \(\delta\) is the damping parameter, and this method is particularly effective when the Hesse matrix is singular or ill-conditioned.

The objectives of rescaling the Newton method include

Improved convergence: in the Newton method, even when the objective function is convex, convergence is slower when the condition number is worse. Rescaling can improve the condition number of the Hesse matrix and speed up convergence.
Improved numerical stability: if the Hesse matrix is approximately singular or inverses are difficult to compute, numerical stability can be maintained by rescaling. In particular, in optimisation problems, the Hesse matrix may be non-positive definite or asymmetric.
Avoiding local optimum solutions and singularities: rescaling of the Hesse matrix can help to avoid problems with local optimum solutions and saddle points.

Newtonian rescaling is an important approach to improve convergence speed and numerical stability, especially when the Hesse matrix is asymmetric or close to a singular matrix. By adjusting the rescaling appropriately, it is possible to maximise the performance of the Newton method.

Implementation example

Python code is shown as an example of an implementation of Newton’s method of rescaling. This example shows a simple implementation of the Newton method with Hesse matrix rescaling and damping applied.

Example implementation: the Newton method with Hesse matrix rescaling and damping

import numpy as np

# Objective function (e.g. quadratic function)
def f(x):
    return 0.5 * x[0]**2 + 0.5 * x[1]**2

# Gradient (first-order derivative of the objective function)
def grad_f(x):
    return np.array([x[0], x[1]])

# Hesse matrix (second-order derivative of the objective function)
def hess_f(x):
    return np.array([[1, 0], [0, 1]])

# Function to update the steps of the Newton method (applying rescaling and damping).
def newton_method_with_rescaling(x0, tol=1e-6, max_iter=100, damping_factor=0.1):
    x = x0
    for i in range(max_iter):
        grad = grad_f(x)
        hess = hess_f(x)
        
        # Adding damping to the Hesse matrix (stabilising when the Hesse matrix is close to singular)
        hess_damped = hess + damping_factor * np.eye(len(x))
        
        # Calculate the amount of step updates
        step = np.linalg.solve(hess_damped, grad)
        
        # Solution update
        x = x - step
        
        # convergence judgment
        if np.linalg.norm(step) < tol:
            print(f"Convergence. Iterations.: {i + 1}")
            return x
    print("Maximum number of iterations reached. No convergence was achieved.")
    return x

# initial point
x0 = np.array([1.0, 1.0])

# Implement the Newtonian method.
solution = newton_method_with_rescaling(x0)
print(f"optimal solution: {solution}")

Description.

Objective function, gradient and Hesse matrix: the objective function is \(f(x)=\frac{1}{2}x_1^2+\frac{1}{2}x_2^2\) and the purchase \(\nabla f(x)\) and Hesse matrix \(\nabla^2f(x)\) are calculated respectively.
Applying damping: damping to the Hesse matrix stabilises singular or numerically unstable matrices. In this case, damping is applied to the Hesse matrix by adding \(\delta I\). \(\delta=damping_factor\) is the damping parameter and is used to maintain the stability of the optimisation.
Solution updating and convergence criterion: the solution is updated based on the formula \(x_{k+1}=x_k-H^{-1}(x_k)\nabla f(x_k)\). The convergence criterion is when the step size (update volume) becomes smaller than a certain threshold (here \(tol=1e^{-6}\).
Output: the optimal solution and the number of iterations are output.

Output of running example.

Convergence. Iterations.: 5
optimal result: [ 1.18641556e-06 -1.18641556e-06]

Description.

This code adds damping (scaling) to the Hesse matrix to stabilise it. It is particularly effective when using the Newton method, when the Hesse matrix is close to singular or when the condition number is poor.
The convergence decision is regarded as convergence when the step size is below a threshold value.

Notes.

It is important to select the damping parameter (damping_factor) appropriately, as too much damping can slow down convergence, while too little damping can lead to numerical instability.

Application examples

Examples of specific applications of Newton’s method of rescaling include.

1. machine learning optimisation problems: the Newton method is used to optimise parameters in the training of machine learning algorithms, in particular logistic regression and neural networks. In these problems, the Newton method often converges faster than the steepest descent method and other methods, but the Hesse matrix may become singular or the condition number may deteriorate. Adding Hesse matrix rescaling or damping in such cases can avoid numerical instability and improve convergence speed.

Specific application (logistic regression): in the logistic regression optimisation problem, the Hesse matrix is calculated as a quadratic approximation of the loss function. When the data set is large and the features are high-dimensional, the Hesse matrix can be close to a singular matrix, and rescaling can stabilise convergence.

2. optimisation in image processing: image processing and computer vision tasks often involve solving optimisation problems in image restoration, edge detection or segmentation. In these tasks, the energy or loss function may be a high-order non-linear function, and Newtonian methods are sometimes used for its minimisation. However, the condition number may deteriorate due to the high dimension of the Hesse matrix, in which case rescaling is an effective approach.

Specific application (image restoration): in image restoration, the Newton method can be used to define an energy function to optimise the image and minimise it. As each pixel of the image is treated as a parameter, the Hesse matrix can become singular in higher dimensions, in which case rescaling improves the stability of the calculation.

3. optimisation of physics simulations: optimisation problems arise in physics simulations, for example in fluid dynamics and elasticity simulations. In these simulations, the energy function modelled by the physical laws must be minimised, and in such problems, Newton’s method is sometimes used and the Hesse matrix is singular, so rescaling is effective.

Specific application (hydrodynamic simulation) : In the optimisation of fluid simulations, fluid velocity and pressure fields are used as parameters and the Newton method is used to solve the minimisation problem. The Hesse matrix is often close to a singular matrix, and rescaling is added to stabilise it.

4. optimal control: in optimal control problems, optimisation is performed to determine the control inputs. These problems often require an energy function that is minimised by considering the state transitions of the dynamic system, and Newton’s method is applied. When the state transitions are complex, the calculation of the Hesse matrix becomes more difficult and rescaling plays an important role.

Specific application (robot control): in robot motion, the Newton method is used to solve energy minimisation problems. When the robot’s motion is complex and the calculation of the Hesse matrix approaches a singular matrix, rescaling is applied to stabilise it.

reference book

Reference books on rescaling in Newtonian methods are described.

1. ‘Numerical Optimisation’ by Jorge Nocedal and Stephen J. Wright
– Abstract: This is a classic textbook in the field of numerical optimisation, providing extensive coverage of the theory and implementation of optimisation algorithms (including Newton’s method). There are also detailed discussions on rescaling and Hesse matrix approximation.
– Also includes: newtonian methods, quasi-newtonian methods, linear convergence, techniques for improving Hesse matrices, etc.

2. ‘Optimisation by Vector Space Methods’ by David G. Luenberger
– Abstract: This book explains optimisation problems from a vector space perspective. It provides detailed theoretical background on various optimisation methods, including Newton’s method.
– Also includes: basic theory of optimisation, Newton’s method, Hesse matrices and their approximations.

3. ‘Convex Optimisation’ by Stephen Boyd and Lieven Vandenberghe
– Abstract: This book covers a wide range of convex optimisation, from the basics to applications. In particular, it provides information on Newton’s method and its computational techniques in convex optimisation, as well as on the implementation of rescaling and damping.
– Also includes: convex optimisation, steepest descent method, Newton method, rescaling techniques.

4. ‘Practical Optimisation: Algorithms and Engineering Applications’ by Andreas Antoniou and Wu-Sheng Lu
– Abstract: A practical guide to applying optimisation algorithms to engineering problems. Many practical techniques for numerical calculations are presented, and examples on Newton’s method and its rescaling are also discussed.
– Also includes: optimisation algorithms, Newton’s method, how to improve condition numbers.

5. ‘Introduction to Nonlinear Optimisation: Theory, Algorithms, and Applications with MATLAB’ by Amir Beck
– Abstract: The book introduces the theory and algorithms for non-linear optimisation problems and provides examples of implementations using MATLAB. The chapter on Newton’s method details the calculation of Hesse matrices and rescaling techniques.
– See also: non-linear optimisation, Newton method, rescaling, MATLAB implementation.

6. “Numerical Optimization“