Alternative methods of numerical differentiation in the calculation of derivatives of Newton’s method

Machine Learning Artificial Intelligence Digital Transformation Deep Learning Information Geometric Approach to Data Mathematics Navigation of this blog

Alternative methods of numerical differentiation in the calculation of derivatives of Newton’s method

The Newtonian method uses the derivative$f'(x)$ to find the roots of the function$f(x)$, but if it is difficult to find the derivative analytically, or if the function is only given numerically, alternative methods of numerical differentiation need to be considered.

A standard method is to approximate the derivative using finite differences, and the following methods are often used.

(1) Forward difference:\[f'(x)\approx\frac{f(x+h)-f(x)}{h}\] Although computation cost is low, the accuracy is low because the error is $O(h)$.

(2) Backward difference:\[f'(x)\approx\frac{f(x)-f(x-h)}{h}\] Similar to the forward difference, the error is $O(h)$.

(3) Central difference:\[f'(x)\approx\frac{f(x+h)-f(x-h)}{2h}\]The error is $O(h^2)$, which is more accurate than the forward and backward differences.

The key points of choice are as follows.

Central differencing is generally more accurate and is recommended if the computational cost is acceptable.
The choice of $h$ should consider the balance between accuracy and rounding error (typically $h\approx\sqrt{\epsilon}$).

Automatic Differentiation (AD) can also be used as a technique that is more accurate than numerical differentiation and computationally simpler than analytical differentiation.

Forward Mode (Forward Mode) AD: Finds derivatives while evaluating the computational graph in a forward direction. Efficient for simple functions, but not suitable for multivariable functions.
Reverse Mode AD: Mainly used for functions with high-dimensional inputs, such as neural networks, and applied to gradient descent methods. Computational cost is high, but effective for high-dimensional problems.

Advantages and disadvantages of this method include the following.

Advantages:
- Unlike numerical differentiation, rounding errors are reduced.
- No need to calculate analytical derivatives by hand.
Disadvantages:
- Requires tools for implementation (TensorFlow, PyTorch, JAX, etc. are supported).

The following methods are also available for solving optimisation problems without directly obtaining derivatives.

Secant Method:
- Approximates the derivatives of the Newton method by finite differences.
- Convergence is slightly slower, but the derivatives do not need to be calculated.
- Update formula\[x_{n+1}=x_n-\frac{f(x_n)(x_n-x_{n-1})}{f(x_n)-f_{n-1})}\]
Brent’s Method.
- A method that combines the split-line method, parabolic interpolation and bisection.
- Does not require derivatives and is very robust.

Brent’s Method: a method that does not explicitly require derivatives, but uses an approximation of the Hesse matrix. Typical algorithms include.

BFGS (Broyden-Fletcher-Goldfarb-Shanno) method:
- The solution is obtained by updating the Hesse matrix in an approximate manner.
- Constrained L-BFGS is common for large-scale optimisation problems.
- See “Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method.” in detail.

These can be summarised as follows.

Central differencing if using high-precision numerical differentiation.
Automatic differentiation if rounding errors are suppressed and as an alternative to analytical differentiation.
For convergence without derivatives, split-line and Brent methods.
For large-scale optimisation, quasi-Newtonian methods such as BFGS.

implementation example

For each method, examples of Python implementations are given.

1. numerical differentiation by difference approximation

(1) Newton’s method using central difference

import numpy as np

def f(x):
    return x**3 - 2*x - 5  # example：f(x) = x^3 - 2x - 5

def derivative(f, x, h=1e-5):  # central difference
    return (f(x + h) - f(x - h)) / (2 * h)

def newton_method(f, x0, tol=1e-6, max_iter=100):
    x = x0
    for _ in range(max_iter):
        f_x = f(x)
        df_x = derivative(f, x)  # numerical differentiation
        if abs(f_x) < tol:
            return x
        x -= f_x / df_x
    return x

x0 = 2  # initial value
root = newton_method(f, x0)
print(f"approximate solution: {root:.6f}")

2. automatic differentiation (Autograd)

Use Python’s autograd library to perform automatic differentiation.

import autograd.numpy as np
from autograd import grad

def f(x):
    return x**3 - 2*x - 5

df = grad(f)  # automatic differentiation

def newton_autograd(f, df, x0, tol=1e-6, max_iter=100):
    x = x0
    for _ in range(max_iter):
        f_x = f(x)
        df_x = df(x)  # Calculating derivatives with automatic differentiation.
        if abs(f_x) < tol:
            return x
        x -= f_x / df_x
    return x

x0 = 2
root = newton_autograd(f, df, x0)
print(f"Approximate solution (automatic differentiation): {root:.6f}")

3. the split-line method

def secant_method(f, x0, x1, tol=1e-6, max_iter=100):
    for _ in range(max_iter):
        f_x0, f_x1 = f(x0), f(x1)
        if abs(f_x1) < tol:
            return x1
        x_new = x1 - f_x1 * (x1 - x0) / (f_x1 - f_x0)  # 割線法の更新式
        x0, x1 = x1, x_new
    return x1

x0, x1 = 2, 3
root = secant_method(f, x0, x1)
print(f"Approximate solution (broken line method): {root:.6f}")

4. fsolve in SciPy (using quasi-Newton method internally)

SciPy’s fsolve can find solutions without the need to provide analytical derivatives.

from scipy.optimize import fsolve

root = fsolve(f, 2)[0]
print(f"approximate solution (SciPy fsolve): {root:.6f}")

Application examples

Substituting the calculation of derivatives of Newton’s method by numerical differentiation or other methods is particularly useful in the following situations

1. route finding from experimental data (numerical differentiation)

Example: finding the zero point of data acquired by a sensor

Situation:

Measured data from physical sensors (e.g. temperature, pressure, flow rate) are available, but the analytical function is unknown.
On the basis of the experimental data, a specific value (e.g. the point at which the pressure is zero) is to be found.

Application method:

Create an interpolating function from the experimental data and apply the Newton method using numerical differentiation (central difference).

Example implementation (NumPy + SciPy)

import numpy as np
from scipy.interpolate import interp1d
from scipy.optimize import newton

# Experimental data (example of pressure sensor data)
x_data = np.array([0, 1, 2, 3, 4, 5])
y_data = np.array([-5, -3, 1, 6, 14, 25])  # Value of f(x)

# Create interpolating functions
f_interp = interp1d(x_data, y_data, kind='cubic')

# Numerical differentiation (central differentiation)
def derivative(f, x, h=1e-5):
    return (f(x + h) - f(x - h)) / (2 * h)

# Application of Newton's method
root = newton(f_interp, x0=1.5, fprime=lambda x: derivative(f_interp, x))

print(f"Approximate solution for zero point: {root:.6f}")

Point:

Converts experimental data into interpolating functions and applies Newton’s method using numerical differentiation.
Zero point can be obtained without knowing analytical derivatives.

2. computation of the optimal policy (automatic differentiation)

Example: optimising a policy in reinforcement learning

Situation.

When updating the parameters $\theta$ of a policy function $\pi_{\theta}(x)$ in reinforcement learning (RL), a gradient calculation is required.
Due to the complexity of the function, it is difficult to obtain derivatives analytically.

Application method:

Apply Newton’s method using PyTorch’s autodifferentiation (autograd).

Example implementation (PyTorch)

import torch

# Example of a policy function (Softmax format)
def policy(theta):
    return torch.log(1 + torch.exp(-theta))  # Example: sigmoidal function-like form

# Automatic differentiation in PyTorch
theta = torch.tensor(2.0, requires_grad=True)

for _ in range(10):  # Iteration of Newton's Law
    loss = policy(theta)
    loss.backward()  # Calculating derivatives with automatic differentiation.
    with torch.no_grad():
        theta -= loss / theta.grad  # Update on Newton's method
    theta.grad.zero_()  # Reset gradient.

print(f"Optimal θ: {theta.item():.6f}")

Point:

Optimisation without analytical differentiation using PyTorch autograd.
Widely used method in reinforcement learning and machine learning.

3. load flow analysis of power systems (split-line method)

Case study: voltage analysis of a power grid

Situation:

In power systems, the non-war form $f(V)=0$ for the voltage $V$ needs to be solved.
Due to the difficulty of analytically calculating the derivatives, the split-line method is used.

Application method:

Use the split-line method to find the solution for voltage $V$.

Implementation example.

def power_flow(V):
    return V**3 - 10*V + 5  # Example: non-linear equations of a power system

def secant_method(f, x0, x1, tol=1e-6, max_iter=100):
    for _ in range(max_iter):
        f_x0, f_x1 = f(x0), f(x1)
        if abs(f_x1) < tol:
            return x1
        x_new = x1 - f_x1 * (x1 - x0) / (f_x1 - f_x0)  # Update formula for the split-line method
        x0, x1 = x1, x_new
    return x1

x0, x1 = 0, 2  # initial value
root = secant_method(power_flow, x0, x1)
print(f"Approximate solution for voltage: {root:.6f}")

POINT:

Can solve non-linear equations without derivatives.
Practical in power system analysis.

4. inverse problem analysis of AI models (fsolve in SciPy)

Example: finding an input that makes the output of a neural network to a specific value.

Situation:

Inverse problem of finding an input $x$ that makes the output of a neural network $f(x)$ equal to $y=0.5$.
Use scipy.optimise.fsolve due to difficulties with differentiation.

How to apply:

Solve using fsolve.

Example implementation.

from scipy.optimize import fsolve

def neural_network_output(x):
    return np.tanh(x) - 0.5  # Example: find x for which the output of the tanh function is 0.5

root = fsolve(neural_network_output, 1.0)[0]
print(f"Input sought.: {root:.6f}")

Key point:

Using fsolve, it is easy to solve inverse problems for complex functions.
It can be used to control and tune neural networks.

reference book

This section describes reference books on alternative methods of calculating derivatives of Newton’s method (numerical differentiation, automatic differentiation, dividing line method, etc.).

1. books on numerical analysis and numerical differentiation

Numerical Calculation: Common Sense and Insane Sense

“Fundamentals of Numerical Computation“

Numerical Analysis

Author(s): Richard L. Burden, J. Douglas Faires
Publisher: Cengage Learning
Description: Detailed coverage of numerical differentiation, Newton’s method, the dividing line method and the solution of non-linear equations. Global standard book.

2. books on automatic differentiation (Autograd)

‘Deep Learning for Natural Language Processing: A Gentle Introduction’

Mathematics for Machine Learning

Authors: Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong
Publisher: Cambridge University Press
Description: includes theoretical background on automatic differentiation and its application to machine learning.

‘Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow’

Author: Aurélien Géron
Publisher: O’Reilly Media
Description: includes a description of optimisation and gradient descent methods using autograd in TensorFlow.

3. optimisation and solving non-linear equations

Introduction to Mathematical Optimisation.

‘Introduction to Numerical Methods’

Convex Optimisation

4. numerical computation and analysis using Python

‘Numerical Computing with Python’

“Computational Mathematics: An introduction to Numerical Analysis and Scientific Computing with Python“