Optimization for the First Time Reading Notes

Mathematics Artificial Intelligence Digital Transformation Online Learning Machine Learning Navigation of this blog

Optimization for the First Time Reading Notes

This book is explained in detail in an easy-to-understand manner, making extensive use of graphic explanations so that the reader can understand concrete problems. The goal is to promote intuitive understanding of the problem and solution method, and to enable the reader to solve specific problems. We have endeavored to present the text in a concise manner, and have included side notes for additional explanations or other things that we would like to mention. Practice problems are provided at the end of each chapter, and answers to all problems are included at the end of the book. For the beginning student learning optimization problems for the first time, this book is a good book because it starts with a mathematical review, making it very easy to understand and learn.

Chapter 1: Mathematical Preparation

1.1 Multivariable Functions

A function whose value is determined by two variables, x and y, such as f(x,y)=x2 + y2, is a bivariate function.
A function with multiple variables is called a “multivariable function
Graph of f(x,y)=x2+y2
Contour Line
A straight line that has the same value
Graph and contour lines of f(x,y)=-x3-3xy2+y3+3x

1.2 Differentiation

1.2.1 Differentiation and Integration

1.2.1.1 Differentiation

Average Rate of Change
For a variable function f(x), the amount of change of f/x = (f(a+h)-f(a))/h is called the “average rate of change” of f when x changes from a to a+h.
Example
When f(x)=x2
Average rate of change near x=1
(f(1+h)-f(1))/h=((1+h)2-1)/h
(2h + h2)/h = 2 + h (h ≠ 0)
Value
The “limit” of h → 0
Limit theorem
When x approaches α with a value different from a, f(x) approaches a certain value α. This is called the “limit” of f(x) when x → a
When f(x) does not approach a constant value when x→α, f(x) is said to “diverge
Instantaneous rate of change and derivative
Called “instantaneous rate of change” or “derivative coefficient” of F at x=α
The derivative coefficient f'(x) for any point x is called the “derivative” of f
Basic Formula
Tangent line
The instantaneous rate of change of the function f(x) represents the slope of tangent at the point x=a
Equation of tangent
One-sided limit and one-sided derivative
When x changes from x=a to a+h at the average rate of change of F, x increases if h>0, and x is a phenomenon if h<0
Right-sided derivative
When h→0 while satisfying h>0
Extreme limit limh→0+.
Right-hand limit
Left derivative
When h→0 while satisfying H<0
Extreme limit limh→0-
left-hand limit
Both are called “one-sided derivative” and “one-sided limit” respectively.
Proposition

1.2.1.2 Partial Differentiation

Overview
Consider the derivative of a two-variable function
When (x, y) changes from [a b] to [a+h b
Average rate of change of f(x, y)
(f(a+h, b) – f(a, b)) / h
x-partial derivative of x
Only one variable changes
Graphical meaning of partial differentiation
x-partial derivative of f(x, y) at (x, y) = (a, b)
Image of x-partial derivative
Slope of curve z=f(x, b) at x=a
y-partial derivative of f(x, y) at (x, y) = (a, b)
Image of y-partial derivative
Slope of curve z=f(a, y) at y=b

1.2.1.3 Differentiation of Composite Function

When (x,y) is varied parallel to the x- and y-axes
Differentiate a 1-variable function to obtain
What if (x,y) varies diagonally?
Example
f(x, y) = x2 + y2
[a b] → [a b] + t[2 3].
z(t) = f(a + 2t, b + 3t)
Average rate of change when t = 0 to t and change
2fx(a, b) + 3fy(a, b)
Proposition
For a 2-variable function f(x,y) and a 1-variable function x(t),y(t)
If z(t) = f(x(t), y(t))
dz(t)/dt = fx(x(t),y(t))x'(t) + fy(x(t),y(t))y'(t)

1.2.2 Tangent planes

Overview
For 1-variable functions, derivatives can be used to express expressions for tangent planes
In 2-variable functions, partial derivatives can be used to express expressions for tangent planes
Graph of a 2-variable 1-time number
z=ax+by
Generalization of Equations in the Plane
Generalization of Equations in the Plane
αx + βy + γz + δ = 0
[α β γ] is a vector perpendicular to the left plane
Normal vector
Plane perpendicular to vector [α β γ] and passing through (a, b, c)
α(x-a) + β(y-b) + γ(z-c) = 0
Equation of the tangent plane
h(x, y) = f(a, b) + fx(a, b)(x-a) + fb(a, b)(y-b)
f(a, b), fx(a, b), fy(a, b) are constants, so h(x, y) is a linear function of two variables
The plane containing the points (a, b, f(alb))
h(x, y) is a plane tangent to the surface z=f(x, y) at (a, b, f(a, b))
Tangent plane

1.2.3 Taylor’s theorem

1.2.3.1 Higher-order derivatives

The derivative of f(x), f'(x), differentiated again is written as f”(x) and called the second derivative of f
Further differentiation of f”'(x) is written as f”'(x) and is called the third derivative of f
2nd derivative of f
For a two-variable function f(x,y), the x-partial derivative fx(x,y) is differentiated again by x, and is written as fxx(x,y), etc.
For a two-variable function f(x,y), the y-partial derivative of the x-partial derivative fx(x,y) is written fxy(x,y), etc.

1.2.3.2 Taylor’s theorem (1-variable)

Taylor’s theorem (first and second order only)
Given a point a and x, there exists a number c between some a and x such that
f(x)=f(a) + f'(a)(x-a) + R2(x) , R2(x)=f”(c)/2 *(x-a)2
For another c, f(x) = f(a) + f'(a)(x-a) + f”(a)/2 (x-a)2 + R3(x)
R3(x) = f”'(c)/3! (x-a)3
Landau symbols
R2(x), R3(x) are remainder terms
Taylor’s theorem for the nth order case
f(x) = f(a) + f'(a)(x-a) + ・・・ +f(n)(a)/N!(x-a)n + o((x-a)n)
o: Landau symbol

1.2.3.3 Taylor’s theorem (multivariate)

Taylor’s theorem (first and second order)
f(x,y) =f(alb) +fx(alb)(x-a) + fy(a,b)(y-b) +o(∥x-a, y-b∥)
∥x-a, y-b∥=√(x-a)2 + (y-b)2
The above equation also holds
Surface z=f(x,y) and tangent plane
Tangent quadric surface
Equation for N-variable function f

1.3 Integration

1.3.1 Integral and basic formulas

Indefinite integrals
For a function f(x), F(x) such that F'(x)=f(x) is called the primitive function of f(x)
The function ∫f(x)dx, which can be written as ∫f(x)dx = F(x) +C using the primitive function F(x) of f(x) and the constant C, is called the “indefinite integral” of f(x) and C is called the “constant of integration
Basic Formula
Theorem 1.8 Partial integrals
For the functions f(x) and g(x), the following holds
∫ f(x)g'(x)dx = f(x)g(x) – ∫ f'(x)g(x)dx
Theorem 1.10 Substituted integrals
If x = g(x) for the functions f(x),g(x), then the following holds
∫f(x)dx = ∫f(g(t))g'(t)dt
Definite Integral
For the primitive function F(x) of the function f(x) and the interval [a, b], the following equation is called the “definite integral” from x = a to b
When f(x) is greater than or equal to 0 in [a, b], the area of the figure bounded by y=f(x), the x-axis, and the lines x=a and x=b is equal to the definite integral

1.4 Vectors and Matrices

1.4.1 Matrices and determinants

1.4.1.1 Matrix Operations

Matrix Operations
Vectors and matrices
Inner products and magnitudes of vectors
Transposed matrices
Inverse matrices
Unit Matrix
Inverse matrix X of A satisfying AX=XA=E

1.4.1.2 Array Formula

Array Formula
3×3 determinant
Remainder Factor
Definition 1.13 Array Formula
Transformations of the determinant
Matrix products, transpositions and determinants

1.4.2 Linear Systems and Factorials

1.4.2.1 Gaussian elimination

Example
If x=t, the solution is
Matrix representation of Gauss’s elimination method

1.4.2.2 Staircase matrices and factorials

In general, using matrix A and vectors x,b, the simultaneous equations can be expressed as “Ax=b
The operation of Gaussian elimination corresponds to the following for the scaling factor matrix [A b
Adding a constant multiple of one row to another row
Replacing rows
Multiplying a row by a non-zero number
The above transformation can be performed by performing the row basis transformation, and the solution can be obtained.
The uncolored areas are zeros.
The transformed matrix is called the “staircase matrix
In general, any matrix A can be transformed into a staircase matrix by row basis transformation
The “number of row vectors that are not zero vectors” of the resulting staircase matrix is called the “rank of A” and is denoted by “rankA
Example
rankA = 2

1.4.2.3 Dimension Theorem

First order independence and first order dependence
If a vector b is a vector a1,…,an and a real number a1,…,an, then rankA = 2. ,an and real numbers α1,…,αn. ,αn, we can write b=α1a1+…,αnan. +αnan, then
b is the “number” of a1,…,an. an, then b can be written as a “first-order combination” of a1,…,an.
Definition of first-order independence and dependence
Two vectors a and b are “first-order independent” if
a is not a constant multiple of b, and b is not a constant multiple of a.
Three vectors a, b, and c are “first-order independent” if
No vector can be written as a linear combination of two other vectors
When neither case is first-order independent
is a “first-order dependence”.
Special positional relationship of vectors
Primary independence is a general relation of vectors
Definition
If N vectors a1,…,an an is “first-order independent” if
No vector can be written as a linear combination of n-1 other vectors
If they are not first-order independent, we say they are first-order dependent.
Fundamental Solutions and Dimensional Theorems
Proposition
For a matrix A and a vector b, if the simultaneous linear equation Ax=b has a solution, then there exists a vector p and either of the following holds
The only solution is x=p
There exists a temporarily independent vector q1,…. . ql exist and the following holds
x is a solution of Ax=b ⇔ x=p + t1q1 +…. +tlql (t1,…. . tl are real numbers)
Let p be a special solution
q1,…. ql is the fundamental solution
Theorems on rank and fundamental times
For a matrix A and a vector b, consider the n-variable simultaneous linear equations Ax=b
If rank [A b] =rank A, then
Ax=b has a solution and the number of fundamental solutions = n – rank A
If rank [A b] ≠ rank A, then Ax=b has no solution
Example
rank [A b] =3, rank[A] =2
rank{A b] ≠ rank A
has no solution
Asymptotic simultaneous linear equations
A simultaneous linear equation whose right-hand side is zero is called an “asymptotic simultaneous linear equation
Ax=0
rank A = r
(n-r)-random linearly independent vectors v1,…,vn-r If v1,…,vn-r satisfy Ax=0, then they are fundamental solutions Translated with www.DeepL.com/Translator (free version)

1.4.3 Inverse Matrices, Determinants and Factorials

Inverse Matrices and Determinants
Theorem 1.22.
Matrix A is invertible (A is a regular matrix) ⇔ |A| ≠ 0
Properties of row basis transformations
Theorem 1.23.
The following three are equivalent
n x n matrix A is invertible (A is a regular matrix)
|A| ≠ 0
rank A = 0
Theorem 1.24.
Let the column vectors a1,…,an of the n x n matrix A Let A be the column vector a1,…,an of the n x n matrix A
|A| ≠ 0 ⇔ a1,…,an are linearly independent. an are linearly independent
it holds that a1,…,an is linearly independent
The same holds for row vectors
Theorem 1.26.
If rankA=r for a matrix A, then
the size of the submatrix whose determinant is non-zero is less than or equal to rxr
Furthermore, there exists a submatrix with size exactly rxr whose determinant is not zero

1.5 Eigenvalues and Eigenvectors

1.5.1 Eigenvalues and eigenvectors

Overview
Example
Definition of eigenvalues and eigenvectors
For any square matrix A, we know that there is always a vector as above
Definition 1.28.
For a square matrix A, there exists a scalar λ and a vector u such that
Au = λu , u ≠ 0
Let λ be an eigenvalue of A
u is an eigenvector of eigenvalue λ
How to find eigenvectors
Failure Sequence
Success Sequence
How to find eigenvalues
How to find eigenvectors

1.5.1.1 Eigenvalues and eigenvectors of a 3×3 matrix

1.5.2 Diagonalization of a matrix

1.5.2.1 Diagonalizing a Matrix

Example
A matrix with zero non-diagonal components is called a “diagonal matrix
When P-1AP=D using a regular matrix P and a diagonal matrix D
A is diagonalizable
There exist matrices that are not diagonalizable

1.5.2.2 Direct Diagonalization of a Symmetric Matrix

Definition 1.33
A symmetric matrix
tA=A
Orthogonal matrix
tQQ=E
Theorem 1.34.
An nxn symmetric matrix has the following properties
All eigenvalues are real numbers
It has n eigenvectors
Eigenvectors for different eigenvalues are orthogonal
determinant and eigenvalues
Theorem 1.36.
In an nxn symmetric matrix A, let the eigenvalues be λ1, λ2,… , let λn
|Then A|=λ1λ2…. . λn holds.

1.6 Implicit function theorem

When there is only one function
When there is more than one function

Chapter 2 Convex Functions

Aim
In the optimization world, “convexity” or “non-convexity” is the dividing line between problems
What is a convex function?

2.1 Properties of Convex Functions

2.1.1 Definition of a convex function

A function is sufficiently smooth
It is differentiable a sufficient number of times and its partial derivatives are continuous
Example
Quadratic function f(x)=x2+bx+c
The minimum can be easily obtained.
Definition 2.1
Let f be an n-variable function
For all λ satisfying 0<λ<1 and all u, v ∈ ℝn
We call f a “convex function” if f((1-λ)u + λv) ≤ (1-λ)f(u) + λf(v)
When -f is a convex function, f is called a “concave function
The same is true for a “narrowly concave function.
1-variable convex functions and 2-variable convex functions
Explanation
Narrowly convex functions
For 0<λ<1
(1-λ)u + λv is the interior point of points u,v
(1-λ)f(u) + λf(v) is the endpoint of the values f(u),f(v)
Figure
Line segment connecting points (u,f(u)) and (v,f(v)) is above the graph of f
f is a convex function
Example of a convex function that is not narrowly convex
AB is above the graph
CD is above the graph
The line segment connecting the points (u,f(u)) and (v,f(v)) is on or above the graph of f

2.1.2 Convex functions and tangent planes

Overview
When a tangent is drawn to a 1-variable convex function, the tangent is at the bottom of the graph and does not intersect the graph.
The same is true for tangent planes
If a bivariate function f is convex, then for all (a, b),(x, y) ∈ ℝ2
f(x, y) ≥ f(a, b) +fx(a, b)(x-a) + fy(a, b)(y-b)
The same holds for n-variable functions
Gradient Vector
Definition 2.2
2-variable gradient vector
N-variable gradient vector
Example
Function inequalities on convex functions and tangent planes
Proposition 2.4
For an n-variable function f, we have
F is a convex function ⇔ f(v) ≥ f(u) + ∇f(u), (v-u)

2.2 Determining Convex Functions

2.2.1 Convexity and differentiation

The one variable function, f(x) = x2 is convex, but is f(x) = √(1+x2) a convex function?
Theorem 2.5.
For a 1-variable function h, the following holds
h is a convex function ⇔ h”(t) ≥ 0 for all t ∈ ℝ
Is the 2-variable function, g(x,y)=x2 – 2xy +2y2 a convex function?
We can examine the convexity of a function using the derivative
Hesse Matrix
In the multivariate case, the first derivative of a univariate function corresponds to a gradient vector and the second derivative corresponds to a Hesse matrix
Definition 2.6.
For a 2-variable function f and (x,y), the above is called the Hesse matrix
If the function is smooth enough, then fxy=fyx
Example
When f(x,y)=x3 + 2xy + 3y2
The gradient vector and the Hesse matrix are
The gradient vector and Hesse matrix for a general n-variable function f and u={x1,…. ,xn), the Hesse function is
Condition on the Hesse matrix
The Hesse matrix provides the following method for determining convexity
Theorem 2.7.
For an n-variable function f, the following holds
(1) f is a convex function ⇔ for each point a ∈ ℝn, tu∇2f(a)u ≥ 0
(2) f is a narrowly convex function ⇔ for each point a ∈ ℝn, tu∇2f(a)u > 0 holds.
Example
Let f(x,y) = x2 – 2xy +2y2
The gradient vector and the Hesse matrix are
Let u = t(u1,u2)
tu∇2f(x,y)u=2u12 -4u1u2 *4u22
Positive for all vectors u
Therefore, it is a convex function

2.2.2 Positivity of the Matrix

Summary
Quadratic form
Matrix Approach
Definition 2.9
For a matrix A and a vector u=(u1,u2,… un) for the matrix A and the vector u=(u1,u2,…
U1,u2,…,un un as variables, the polynomial p(u) = tuAu is called the “quadratic form
Example
Let u=(x,y) for matrix A
The above is a quadratic form
For matrix B, let u=(x,y) as above
The above is also quadratic
Definition of Positivity
Definition 2.11
Let matrix A be a symmetric matrix
a matrix satisfying tA=A
Example
If the quadratic form satisfies tuAu≥0 for all u∈ℝn
A is called “semi-positive definite
When the opposite sign holds
A is called “semi-negative definite
When tuAu>0 for all u∈ℝn satisfying u≠0
A is called “positive definite
When the opposite sign holds
A is said to be “negative definite
A is “indefinite” when tUAu has both positive and negative signs depending on U ∈ ℝn
Examples
Example 1
Quadratic form (tuAu)=2×2+y2
Since 2×2+y2>0, A is positive definite
Example 2
Quadratic form=x2+2xy-y2
Positive for u=(1,0), but negative for u=(0,1), so B is indeterminate
Recapitulation of the Convex Function Determination Theorem (using positivity)
Theorem 2.13.
For a function f, the following holds
f is a convex function ⇔ the Hesse matrix ∇2f(a) is semipositive definite at each point a ∈ ℝn
f is a narrowly convex function ⇔ the Hesse matrix ∇2f(a) is positive definite at each point a ∈ ℝn Translated with www.DeepL.com/Translator (free version)

2.2.2.1 How to check positivity

Overview
A function is convex if its Hesse matrix is semi-positive definite at each point
How do we check if a matrix is semi-positive definite?
General method using linear algebra
Determination using eigenvalues
If A is a symmetric matrix, the following arguments hold
(1) A is semi-positive definite ⇔ All eigenvalues of A are greater than or equal to 0
(2) A is positive definite ⇔ All eigenvalues of A are positive
(3) A is indefinite ⇔ A has sex and negative eigenvalues
Example
Let f(x,y) = 3×2 -2xy +3y2
The gradient vector and the Hesse matrix are
The Hesse matrix is a constant matrix and has eigenvalue 4.8
Therefore, the Hesse matrix is positive definite and f is a narrowly convex function
Determination method using determinant
If the matrix to be checked for positive definiteness is a 2×2 matrix, the determinant can be used as follows
Theorem 2.16
(1) |A|>0 and a>0 ⇔ A is positive definite
(2) |A|>0 and a<0 ⇔ A is negative definite
(3) |A|<0 ⇔ A is indefinite
Example
For f(x,y)=3×2 -2xy +3y2
Positivity of the Hesse matrix is checked based on the determinant
∇2f(x,y)| =32 >0 and fxx(x,y)=6>0
Therefore, the Hesse matrix is positive definite.

Chapter 3 Optimization Problem

Aim
Methods for finding the minimum and maximum values of a multivariate function

3.1 What is an optimization problem?

Overview
Optimization problem is the problem of minimizing or maximizing a function
Example 3.1
Which rectangle has the largest area among rectangles whose lengths of the sides are 4?
Maximize
f(x,y)=xy
Constraint
x+y=4,x≥0, y≥0
(Constrained) optimization problem
Example 3.2
Given four points (1,3),(2,5),(3,5),(4,7) in the plane, what is the straight line passing closest to these points?
The straight line is y=ax+b
The error between the line and the point (1,3) is 3-(1-a+b)
Find the line that minimizes the sum of the squares of the errors
Least-squares method
Minimize
f(a,b)={3-(a+b)}2 + {5-(2a+b)}2 + {5-(3a+b) + {7-(4a+b)}2
Constraints
Unconstrained
Unconstrained optimization

3.1.1 Unconstrained optimization problem

Function to be minimized f(x): objective function
Definition of optimal solution
Definition of optimal solution for unconstrained minimization problem
Definition 3.3
When ẋ is f(x) ≥ f(ẋ) for all x ∈ ℝn, f(ẋ) is called the global minimum and ẋ the global minimum solution
In general, it is difficult to find a global minimum solution, so we use a method to find a local solution.
Definition 3.4
If f(x) ≥ f(ẋ) for all x ∈ ℝn sufficiently close to Ẋ, we call f(ẋ) the local minimum and ẋ the local minimum solution
Example 3.5
Minimize f(x)=x2+2x+3
f(x)=x2+2x+3=(x+1)2+2
f(x)≥f(-1)=2 (all x)
x=2 gives the smallest value for a real number x in the technique
global minimum solution
Example 3.6
Minimize f(x)=x3-x
There is no global minimum solution
Local minimum solution
Extreme value problems
Extreme value problems handled by differential and integral calculus are also optimization problems
Definition 3.7
When x≠ẋ and f(x)>f(ẋ) for all x sufficiently close to ẋ, we say that f takes a minimum value at ẋ
If x≠ẋ and f(x)<f(ẋ) for all x sufficiently close to Ẋ, we say that f has a maximum at ẋ.
The maxima and minima are collectively called “extreme values.
Relationship between optimal values and extrema
Example 3.8 Minimum value that is not a minimum
Minimize f(x,y)=x2 + 2xy +y2
From f(x,y)=(x+y)2, f(x,y)≥0 for all (x,y)
From f(0,0)=0
0 is the global minimum of f and (0,0) is the global minimum solution of f
f is 0 everywhere on the point (x,y)=(t,-t)
f does not have a minimum at (0,0) because there is a point that has the same value as f(0,0)
The minimum solution is not unique

3.2 Optimal Conditions

Overview
Relationship between the properties of the optimal solution and the graph of the function
The maximization problem becomes a minimization problem when the objective function is multiplied by (-1)

3.2.1 First-order optimality conditions

For a 1-variable function, the optimal solution can be obtained from the increasing/decreasing table
Theorem 3.9 for 1-variable functions
For a 1-variable function f(x), f'(a) = 0 if point a is a local optimal solution

3.2.1.1 Multivariate functions

Increment/decrement tables cannot be written for multivariable functions.
The same properties as for one-variable functions hold as follows
Theorem 3.10 Optimality conditions of the first order
If point ẋ is a locally optimal solution, then ∇f(ẋ) = 0 holds
Theorem 3.11 If point p satisfies ∇f(p) = 0, then p is called a “stopping place” of f
If there exists a locally optimal solution, it is always a stopping point
Therefore, if we find all the stopping points, the optimal solution is always among them.

3.2.1.2 Geometric Image of a Stopping Point

Example
Minimize f(x,y)=x3 – 3xy +y3 with (x,y)=(alb) as the local minimum solution
The gradient vector of the objective function is
Where is the local minimum solution, (x,y,z)=(a,b,f(a,b)), in the graph?
The point (alb,f(alb)) is at the bottom of the depression in the graph.
If we draw a plane tangent to the bottom of the depression, we get a horizontal plane.
Tangent plane of f at point (alb)
z=f(a,b) + fx(a,b)(x-a) + fy(a,b)(y-b)
Since the plane is horizontal, the z-coordinate of the tangent plane expression for any (x,y) is the constant value f(a,b)
Therefore fx(a,b)=fy(a,b)=0
∇f(a,b)=0

3.2.2 Stopping points and local optimization

A locally optimal solution would be a stopping point, but even a stopping point is not necessarily a locally optimal solution
Example
Minimize f(x,y)=x2-y2
∇f(0,0)=0 but not a local minimum solution
f(0,0) is not the smallest value
3.2.3 Second-order optimality conditions
How do we find the local optimum solution among the stopping points?
Guess from the graph
Minimize f(x,y)=x3 -3xy +y3
Gradient vector of each point
Calculating the points of interest of F, we get (0,0) and (1,1)
Looking at the graph, (0,0) is at the point where the graph is twisted and (1,1) is at the point where it dips down
(1,1) is the local minimum solution
Determination method using Hesse matrix
How to know if each point of interest is a local minimum solution from the expression for the objective function f?
If the stopping point ẋ is a local minimum solution, then at point ẋ, at least the graph is depressed below
Ẋ and is depressed down at Ẋ ⇔ near point ẋ the function f is a “locally convex function”
Theorem 3.13 Second-order optimality conditions
(1) (Necessity) ẋ is a local minimum solution ⇒ ∇f(ẋ) = 0 and ∇2f(ẋ) is semidefinite
(2) (Sufficiency) ∇f(ẋ)=0 and ∇2f(ẋ) is positive definite ⇒ ẋ is a local minimum solution. Furthermore, it takes a minimum value there
(3) (Negative) ∇2f(ẋ) is indefinite, then ẋ is not a locally optimal solution
Geometric image of the second-order optimality condition
Relation between the graph and the positive definiteness of the Hesse matrix
Relationship between optimality and second-order conditions
Optimal solutions not known by second-order optimality conditions
Minimal solution may be obtained even if (b) is not satisfied
Example
Minimize f(x,y)=x4+y4
Gradient vector and Hesse matrix
The stopping point is (0,0)
Hesse matrix at (0,0) is zero matrix
(b) is not satisfied
Since f(x,y)≥0 at any (x,y), (0,0) is the global minimum solution
When (x,y)≠(0,0), (0,0) is the minimal solution since f(x,y)>0

3.3 How to find the locally optimal solution

Example 1
Minimize f(x,y)=x2+xy+y2-9x-9y+27 local optimum
Find the stopping point
Gradient vector
Solution of simultaneous equations
(x,y)=(3,3)
Examine second-order optimality conditions for this point
Compute the Hesse matrix
When (x,y)=(3,3)
Check the positivity of this matrix
Solution 1
The eigenvalues of the matrix ∇2f(3,3) are 1 and 3. Therefore, by the theorem, ∇2f(3,3) is positive definite.
Solution 2
Since fxx(3,3)>0 and |∇2f(3,3)|>0, the matrix ∇2f(3,3) is positive definite by the theorem
Therefore, (3,3) is a local minimum solution and the local minimum is f(3,3)=0.
Example 2
Finding the local optimum of f(x,y)=x3-3xy+y3
Find the gradient vector and Hesse matrix
Find the stopping point
The stopping points are (x,y)=(0,0),(1,1)
Check the definiteness of the Hesse matrix
At (0,0), the determinant of the Hesse matrix satisfies |∇2f(0,0)|<0
Hesse matrix is indefinite
It is not a locally optimal solution
For (1,1), fxx(1,1)=6>0 and|∇2f(1,1)|=27>0
Hesse matrix is positive definite
Local minimum solution
Local first value is -1
Table of signs
Summary
Example 3
What is the local optimum for f(x,y)=x3 + y3 -6x -6y?
The gradient vector and Hesse matrix are
simultaneous equations
Stopping point(x,y)=(±√2, ±√2)
Relation between sign and optimality
When (x,y)=(√2, √2), local minimum -8√2
When (x,y)=(-√2,-√2), the local maximum is 8√2
Are all local optimal solutions obtained?
In the last example, 4 stopping points are obtained, and all local optimal solutions, if they exist, are among these 4.
Local Optimum and Extreme Values
Extreme values are a special kind of local optimum

3.4 Convexity and optimal solutions

Examining second-order optimality conditions only tells us that a point ẋ is a locally optimal solution
How can we find the global optimal solution?
We can easily find the global optimal solution if the function is convex
Theorem 3.18.
If an N-variable function f is convex, then the points of note of f are global minimum solutions

3.5 Proof of Theorem 3.13

Chapter 4 Constrained Optimization Problems

Overview
Constrained optimization problems have a solution method called the Lagrange multiplier method
There is also the KKT condition used in the general case

4.1 Constrained optimization problems

Example
Maximize f(x,y) := xy
Constraints x+y=4, x≥0, y≥0
xy=x(4-x)=-x2+4x=-(x-2)2+4
A:(1,3)→B:(2,2)→C:(3,1)
Constrained Minimization Problem Definition
Constrained Minimization Problem
Minimize f(x)
Constraint x∈C
Let the set C be a subset of the number vector space ℝn
Set C is the “feasible region”
A point in C is a “feasible solution”
x ∈ C” means “x is contained in C”
Definition 4.2
If Ẋ∈C is sufficiently close to ẋ for all x on C such that f(x) ≥ f(ẋ), then ẋ is called a local minimum solution and f(ẋ) is called a local minimum value
Definition 4.3 If ẋ∈C is f(x)≤f(ẋ) for all x on C, then ẋ is called the global minimum solution and f(ẋ) the global minimum value
Conceptual Diagram Translated with www.DeepL.com/Translator (free version)

4.2 The case of a single equality constraint

Overview
The problem of minimizing a bivariate function f(x,y) over a circle

4.2.1 Curved Decay Table

Use contours of the objective function to examine changes in the objective function over the feasible region
Optimizing a linear function over a circumference
Minimize or maximize a linear function over a circumference
Minimize or maximize f(x,y):=x+y
Constraint x2+y2=1
Line ℓ2 is a straight line satisfying f(x,y)=x+y=-1
Contours of f
ℓ1 and ℓ3 are also contours of f for some values
Relationship between the circle and 𝓁2
When a point moves from A to B to C in the figure, the value of the function f is
The intersection of contour line 𝓁2 and the circle is not a locally optimal solution
The relationship between the circle and 𝓁3
Near the contact point between contour line 𝓁3 and the circle
(x,y) is smallest at (-1/√2, -1/√2)
(-1/√2, -1/√2) is the local minimum solution
Contour line 𝓁1 is the local maximum solution
Optimizing a general function over a circle
Optimization of a general function that is not linear
Relationship between contours and feasible region of objective function (Part 1)
Contour lines and feasible region intersect
Not a locally optimal solution
Contour lines and feasible region intersect
It is a locally optimal solution
Relationship between contours and feasible region of objective function (Part 2)
Contour lines and feasible region intersect
Not a locally optimal solution
Contour lines and feasible region intersect
Locally optimal solution

4.2.2 Lagrange multiplier method

Optimize a general function on a general curve
Contour lines of the objective function and feasible region are tangent to each other at local optimum
Theorem 4.4 Lagrange multiplier method
Minimize or maximize f(x)
Constraint g(x)=0
Let Ẋ be the locally optimal solution
If ∇g(ẋ)≠0, then there exists some number λ such that
∇f(ẋ) = λ∇g(ẋ), g(ẋ) = 0 holds
Gradient vectors and contours
Gradient vectors and contour lines are orthogonal to each other
What is meant by “orthogonal”
The gradient vector and the contour tangent thereto are orthogonal
Consider the graph z=f(x,y) and the tangent plane z=f(a,b)+fx(alb)(x-a)+fy(a,b)(y-b) of the function f
Section of the graph of f and tangent plane cut by a plane z=f(a,b) passing through the point (a,b,f(a,b)) and parallel to the xy-plane
Equation of tangent to contour line
fx(a,b)(x-a) + fy(a,b)(y-b) = 0
The gradient vector ∇f(a,b) and the tangent at (x,y) = (a,b) of the contour line are orthogonal
The gradient vector points in the direction of increasing function values
Constrained stopping points
Existence theorem of optimal solution

4.2.3 Solving Equality Constrained Minimization Problems

Example
Assumptions
Minimization f(x,y):=x-y
Constraint g(x,y):=2×2+3y2-1=0
From the Lagrange multiplier method, the local optimal solution is contained in (x,y) when the above equation is solved for (x,y),λ
The gradient vector of the objective function and constraints
The equation is
If we find (x,y),λ satisfying the above equation, (x,y) becomes a stopping point
Stopping point (x,y)=(±√(3/10), ∓√(2/15))
Substitute the stopping point for f
Global minimum solution(-√(3/10),√(2/15))
Global minimum(-√(3/10-√(2/15))
Example
Assumptions
Minimize f(x,y):=x-y+z2
Constraint g(x,y,z):=x2+y2+2z2-1
Gradient vector of objective function and constraint equation
The equations are
When z=0
(x,y,z)=(±1/√2, ∓1/√2, 0)
When z≠0
It is not a solution
Example
Assumptions
Objective function and gradient vector of constraints
The equation is
The stopping point coincides with the eigenvector of A

4.3 When there are multiple equality constraints

4.3.1 The case of two constraints

Theorem 4.10 Minimization problem
Minimize f(x)
Constraints g1(x)=0, g2(x)=0
Let Ẋ be the local minimum solution
If ∇g1(ẋ) and ∇g2(ẋ) are temporarily independent, then there exist certain numbers λ1 and λ2 such that
∇f(ẋ) = λ1∇g1(ẋ) + λ2∇g2(ẋ)
g1(‘ẋ)=0, g2(ẋ)=0
feasible region diagram
Minimize
f(x,y,z) =αx + βy + γz +δ
Constraints
g1(x,y,z)=x2+y2+z2-1=0
g2(x,y,z)=x+y+z-1=0
The feasible region is the common part of the sphere and plane
En of space
Isosurfaces
Set of points (x,y,z) where the 3-variable functions have the same value f(x,y,z)=(constant)
Isosurface
Gradient vector and isosurface are orthogonal
The isosurface of the objective function is tangent to the feasible region in a locally optimal solution
Plane tangent to feasible region
What is the positional relationship between the contour plane of the objective function and the feasible region?
The normal vector of the plane tangent to the feasible region is λ1∇g1(a,b,c) + λ2∇g2(a,b,c)
(A,b,c) is a local minimum solution
⇒ The contour plane of the objective function and the feasible region are tangent at the local minimum solution
⇒ The gradient vector of the objective function is ∇f(a,b,c) = λ1∇g1(a,b,c) + λ2∇g2(a,b,c)

4.3.2 The general case of the number of constraints

When the number of constraints is an arbitrary number, we have the following
Theorem 4.11 Minimization problem
Minimize f(x)
Constraints g1(x)=0,g2(x)=0,… ,gm(x)=0
Let Ẋ be the local solution
g1(ẋ),… .gm(ẋ) are temporarily independent, then there exists some number λ1,. ,λm exists and
∇f(ẋ) = λ1∇g1(ẋ) +… + λmgm(ẋ)
g1(ẋ) = 0, g2(ẋ) = 0, … ,gm(ẋ) = 0

4.4 Second-order optimality conditions

Overview.
Using the second-order derivative in a constrained optimization problem, consider the sufficient conditions for an optimal solution
For the condition that point ẋ is a local optimal solution with minimization f(x) and constraint g(x) = 0
If the contour line of the objective function and the feasible region are tangent
B becomes a local minimum solution
Even if the contour lines of the objective function are tangent, if they are twisted at the contact point
B does not become a local minimum solution
Example of stopping points that do not become local optimum solutions
First, a simple optimization condition theorem
Theorem 4.12 Optimization problem
Minimize f(x)
Constraints g1(x)=0,g2(x)=0,… ,gm(x)=0
For some real numbers λ1,…,λm ẋ is a local minimum solution when ẋ satisfies the above equation for some real numbers λ1,…,λm

4.4.1 Second-order optimality necessary condition

Theorem 4.13 Minimization problem
Minimize f(x)
Constraints g1(x)=0,g2(x)=0,… ,gm(x)=0
Let ẋ be the local minimum solution
∇g1(ẋ),… If ∇gm(ẋ) is temporarily independent, then for some numbers λ1,. λm, the above holds for some number λ1,…,λm.
Definition 4.14.
Let point ẋ be a point on the set C.
For some ε > 0, the vector function u(t)=(u1(t),…,un(t)) on (-ε, ε) .un(t)) satisfying u(0)=ẋ and u(t) ∈ C, then
the vector u'(0):=[u1′(0)…. un'(0)] is called the “tangent vector” in ẋ of C.
Example
The set C satisfying x2+y2+z2=1

4.5 Inequality Constraint Problem

Outline
Constraints on equality and inequality
Example 4.17 Projection problem
Plane 4x+y+2z=2
Inside the unit sphere x2+y2+z2≤1
Find the point in the common part of the above with the closest distance to point (2, 3, 4)
Formulation
Minimize
f(x,y,z):=(x-2)2+(y-3)2+(z-4)2
Constraint
g1(x,y,z):=x2+y2+z2-1≤0
g2(x,y,z):=4x+y+z-2=0 Translated with www.DeepL.com/Translator (free version)

4.5.1 When there is only one inequality

When the constraint equation is only one inequality representing the circumference and interior
Minimize
f(x,y)
Constraint
g(x,y):=x2+y2-1≤0
Let (a,b) be the local minimum solution under the above conditions, the following two cases are possible
(a,b) is inside the circle (g(a,b)< 0)
∇f(a,b) = 0 holds (usual remarks)
If (a,b) lies on the circumference (g(a,b)=0)
There exists some real number λ such that.
-∇f(a,b)=λ∇g(a,b) and λ>0
Orientation of the gradient vector and the inequality region
Theorem 4.18 Minimization problem
Minimize
f(x)
Constraint
If g(x)≤0
If Ẋ is a local minimum solution and ∇g(ẋ) ≠ 0, then there exists some number λ such that
-∇f(ẋ) = λ∇g(ẋ)
λg(ẋ) = 0, λ≥0
g(ẋ) ≤ 0
Example
Minimize
f(x,y) = x2 + 6xy + y2
Constraints
g(x,y) = x2 + y2 -1 ≤ 0
From Theorem 4.18
solve for x, y, and λ
If x2+ y2 -1<0
that satisfies the above equation is
λ=0
(x,y)=(0,0)
When x2+y2-1=0
Find a point satisfying the above equation
Solve the eigenvalue problem to get the answer
The one satisfying λ>0 is
(x,y) = (0,0), (±1/√2, ∓1/√2)
Find the value of the objective function f in each
The minimum value is f(±1/√2, ∓1/√2) = -2
At the point (±1/√2, ∓1/√2) -∇f(x,y) points to the orthogonal outside

4.5.2 Multiple inequalities

Theorem 4.20 Minimization problems
Minimize
f(x)
Constraint
For g1(x)≤0, g2(x)≤0
for ẋ is a local minimum solution and {∇g1(ẋ), ∇g2(ẋ)} is first order independent
There exist certain numbers λ1, λ2 such that.
-∇f(ẋ) = λ1∇g1(ẋ) + λ2∇g2(ẋ)
λigi(ẋ) = 0, λi ≥ 0
gi(ẋ) < 0 (i=1,2)

4.5.3 When the constraint has inequality and equality (KKT condition)

Theorem 4.21 Minimization problem
Minimize
f(x)
Constraint
g1(x)≤0, g2(x)≤0
h(x)=0
for ẋ is a local minimum solution and {∇g1(ẋ), ∇g2(ẋ), ∇h(ẋ)} is first order independent
Given some numbers λ1, λ2, µ, and
-∇f(ẋ) = λ1∇g1(ẋ) + λ2∇g2(ẋ) + μ∇h(ẋ)
λigi(ẋ) = 0, λi ≥ 0, gi(ẋ) ≤ 0
μ: arbitrary, h(ẋ) = 0
Example
Maximize
f(x,y,z)=αx + βy + γz +δ
Constraint
g1(x,y,z)=x2+y2+z2-1≤0
g2(x,y,z)=-x-1/2≤0
h(x,y,z)=x+y+z-1=0
Example of feasible region

4.5.4 Examples

Example
Minimize
f(x,y,z):=(x-2)2 + (y-3)2 +(z-4)2
Constraints
x2 + y2 + z2 ≤ 1
4x + y+ 2z = 2
KKT condition

4.6 Convex Planning

A problem in which the objective function and the constraint formulas are all convex functions is called “convex programming.
Theorem 4.24
Minimize
f(x)
Constraint
gi(x) ≤ 0 i=1,…,m ,m
Let f and g be convex functions
If Ẋ satisfies the KKT condition, then ẋ is a global minimum solution

Chapter 5: Linear Programming Problems

Objective.
Linear programming is an optimization problem in which the objective function and constraint equations are all first-order expressions.
Even when the number of variables and constraint equations are several thousand, a computer can be used to find a solution quickly.

5.1 What is linear programming?

Linear programming is an optimization problem in which the objective function and constraint equations are all linear.
Example
A factory produces Product 1 and Product 2.
Product 1 uses 1 kg of raw material A and 3 kg of raw material B, and sells for 80,000 yen.
Product 2 uses 1 kg of raw material A and 1 kg of raw material B, and sells for 60,000 yen.
Given 4 kg of raw material A and 6 kg of raw material B, how many of each of Product 1 and Product 2 should be produced to maximize profit?
Summary table
Formulation by optimization problem
The number of product 1 is x1 and the number of product 2 is x2
Minimize
-8×1-6×2 (profit)
This is a maximization problem point, but if the objective function is multiplied by (-1), it becomes a minimization problem.
Constraints
x1 + x2 ≤ 4 (inventory of raw material A)
3×1 + x2 ≤ 6 (inventory of raw material B)
x1, x2 ≥ 0

5.2 Simplex method

Overview
Formulation of the optimization problem
Minimize
-x1 -2×2 -3×3
Constraints
x1 +x3 ≤ 2
2×1 + x2 + 2×3 ≤ 5
3×1 + x2 +2×3 ≤ 6
X1, x2, x3 ≥ 0
feasible regions and isosurfaces

5.2.1 Overview of the unitary method

Approach when the vertices of the feasible region are roads
Simplex method
Main operations of the unitary method

1 Introduce slack variables

Introduce x4, x5, x6
Transform into an equivalence problem
Minimize
– x1 – 2×2 – 3×3
Constraints
x1 + x3 + x4 = 2
2×1 + x2 + 2×3 + x5 = 5
3×1 + x2 +2×3 + x6 = 6
x1, x2, x3 ,x4, x5, x6≥ 0
Feasible Solutions
Variation of constraint equation
From x4=0,×5=0
Points on two planes x1+x3=2, 2×1+x2+2×3=5 Translated with www.DeepL.com/Translator (free version)

2 Create a dictionary

Leave the slack variables x4, x5, and x6 on the left-hand side, transfer the rest to the right-hand side, and let z be the objective function
Dictionary 1
Minimize
z = – x1 – 2×2 – 3×3
Constraint
x4 = 2 – x1 – x3
x5 = 5 – 2×1 – x2 – 2×3
x6 = 6 – 3×1 – x2 – 2×3
x1, x2, x3, x4, x5, x6 ≥ 0
The variables x4, x5, and x6 appearing on the left-hand side are the “ground variables”.
The basis variables are not included in the variables of the objective function
Variables x1, x2, x3 appearing on the right-hand side are “non-basis variables”.

3 Find a feasible rule solution

The feasible rule solution obtained by setting all the non-basis variables x1,x2,x3 in the dictionary to 0.
feasible basis solution
Objective function z=0
Polyhedral vertices of feasible solution

4 Solution Update

Solution update rules
Among the non-basis variables, select one whose coefficient is negative in the objective function and set its value to t and the values of the other non-basis variables to 0.
Increase the value t of the non-basis variable selected above to the maximum extent that all the basis variables are non-negative.
Update Rule 1
Set t to the non-basis variable whose coefficient is negative, and set the other non-basis variables to 0.
Update Rule 2
The maximum non-negative value of t for X4, x5, and x6 is t=2
New feasible solution
Substituting
Objective function decreases with new feasible solution
Graphical meaning of the solution update rule

5 Updating the dictionary

Comparison of the original feasible solution and the new feasible solution
Create a new dictionary with the non-basis variable x3 replaced by the stipulated variable and the basis variable x4 replaced by the non-basis variable
x3 = 2 – x1 – x4
Substitute the above equations into all equations
Dictionary 2
Minimize
z = -6 + 2×1 -2×2 +3×4
Constraints
x3 = 2 – x1 – x4
x5 = 1 – x2 + 2×4
x6 = 2 – x1 – x2 +2×4
x1,x2,x3,x4,x5,x6≥0
Dictionary 2 is a dictionary with the new feasible solution in the feasible prescribed solution

6 Iteration of 4,5

Iteration 1
Non-basis variables x1,x2,x4 in dictionary 2
Only x2 has negative objective function coefficients
x2=t
Other non-basis variables x1,x4=0
Increasing variable x2=t
Update dictionary so that x2 is the basis variable and x5 is the non-basis variable
New dictionary3
Second iteration
New feasible solution
New dictionary 4
Finish updating the solution because the coefficients of the variables of the objective function are all greater than or equal to 0

7 Termination of the unitary method

Because of the constraint that all variables are greater than or equal to 0, the optimal value of dictionary 4 is -10 when x1=x3=x5=0
Optimal solution for dictionary 4
Optimal value -10
Optimal solution of the original problem
Optimal value -10
Consideration
Simple calculation is possible if the position of the vertex is known
Pivot
Simplified computation of the unitary method
Rewriting dictionary1
Circle x3 in the expression with x4 on the left side as a “pivot
Remove x3 from the right-hand side of the second constraint equation
Create a new dictionary by applying it to other expressions

5.2.2 Example

Minimize
– x1 – 2×2
Constraint
x1 + x2 ≤ 3
– 2×1 + x2 ≤ 2
x1,x2≥0
Introducing slack variables to create dictionaries
Dictionary1
Minimize
z = – x1 – 2×2
Constrain
x3 = 3 – x1 – x2
x4 = 2 + 2×1 1 x2
x1,x2,x3,x4≥0
D1
Pivot
New dictionary
New dictionary
Since all coefficients of the objective function are greater than or equal to 0, updating the solution is completed
Minimum solution for D3 is (1/3, 8/3, 0, 0)
Minimum solution of the original problem (1/3, 8/3), minimum value is -17/3

5.2.3 Exception Handling

(1) Problem is unbounded
(2) Linear programming problem does not have an optimal solution
The feasible region is an empty set
(2) When the objective function can be made as small as possible
Example
Minimize
-x1
Constraint
x1 – x2 ≤ 1
-x1 + x2 ≤ 1
X1,x2≥0
Introduce slack variables to create a dictionary
Variables in the objective function with negative coefficients are x2
Since the coefficient of x2 is positive or 0 in the constraint equation, the value of x2 can be as large as desired.
Since the objective function can be made as small as possible, there is no optimal solution
Even if the feasible region is unbounded, the problem is not necessarily unbounded
(2) Degeneracy of dictionary
Contrary to a non-bounded problem, there are cases where the value of a non-specified variable cannot be increased at all.
Example
Minimization
-x1
Constraint
x1 – x2 ≤ 0
x1 + x2 ≤ 2
x1, x2 ≥ 0
To apply the unitary method, create a dictionary
The only variable in the objective function with negative coefficients is x1
If x1 is increased even slightly from 0, x3 becomes negative, so x1 cannot be increased even slightly
Create a dictionary by interchanging x3 and x1
In D1, there is a non-specified variable x2 that can be increased
All coefficients are greater than or equal to 0, so we are done
The optimal value is -1 and the optimal solution is (1,1)
(3) Dictionary traversal
When updating a degenerate dictionary does not yield a non-degenerate dictionary, and updating the dictionary leads to an infinite number of degenerate states
(4) Cyclic
To avoid circling, use “Brand’s Pivot Selection Method” when updating solutions and dictionaries
When choosing non-base variables in the solution update rule, select those with the smallest subscripts.
If there are multiple predefined variables that can be reduced to zero by increasing the chosen non-base variable, choose the one with the smallest subscript among them.
Using Brand’s pivot selection method makes the algorithm very slow
(4) Initial point is not feasible
In the one-body method, assign 0 to all the non-specified variables in the dictionary to obtain a feasible basis solution.
The above may not be feasible
Example
Minimize
-x1
Constraint
x1 – x2 ≤ -1
x1 + x2 ≤ 2
x1, x2 ≥ 0
Dictionary
Assigning 0 0 0 to non-prescribed variables x1 and x2 (x1, x2, x3, x4)=(0, 0, -1, 2)
No feasible solution is obtained.
In this case, we need to find a feasible solution (dictionary) first
Ignore the objective function of the original problem
Put a new variable x5 in the constraint equation
Minimize
z = x5
Constraints
x5 = 1 + x1 -x2 + x3
x4 = 2 + 2×1 -x2
x1,x2,x3,x4,x5 ≥ 0
The objective function is x5 = x3 – (-1 – x1 + x2)
Difference between the two sides of the constraint first equation of D1
5.3 Generalization of the linear programming problem
Overview
Can be expressed using matrices and vectors

5.3.1 Transformations of linear programming problems

(1) The case of a maximization problem
(2) When the constraints are equations
(3) The case where variables have no variables Translated with www.DeepL.com/Translator (free version)

5.4 Dual Problem

5.4.1 Dual Problem

Linear programming problems have another underlying problem called the dual problem
Example: Nutrition Problem
Nutrients have a minimum daily intake
Foods 1 and 2 contain two main types of nutrients
Each has a minimum intake of each nutrient and a food price
The Problem
Consumer Perspective
What ratio of the two foods should be purchased to minimize food costs while providing the necessary nutrients?
Formulation (P0)
Minimize
3×1 + x2
Minimize food cost
Constraint
4×1 + 3×2 ≥ 7
Nutrient A intake
5×1 + 2×2 ≥ 8
Nutrient B intake
x1, x2 ≥ 0
Perspective of pharmaceutical companies that make vitamin supplements
To maximize sales while keeping the price of vitamins low enough to get nutrients at a lower cost than regular food, how should the price of the vitamins be set?
Formulation (D0)
Let vitamins A and B be vitamins of nutrients A and B, respectively.
Let y1 and y2 be the price per unit of vitamin pills
Maximize
7y1 + 8y2
sales
Constraint
4y1 + 5y2 ≤ 3
Keep price below commodity 1
3y1 + 2y2 ≤ 1
Price less than or equal to commodity 2
Relationship between Problem P0 and Problem D0
Objective function of P0
3×1 + x2
≥ (4y1 + 5y2)x1 + (3y1 + 2y2)x2
= (4×1 + 3×2)y1 + (5×1 + 2×2)y2
≥ 7y1 + 8y2
Objective function of D0
Consumer food purchases (3×1 + x2)” ≥ “Pharmaceutical sales of vitamins (7y1 + 8y2)”
The “maximum value of pharmaceutical company’s vitamin sales” per consumer cannot exceed the “minimum value of consumer’s food purchase cost
Definition of the dual problem
The problem of minimizing the cost of food (P0) can be solved by using vectors and matrices.
Replacing specific numerical values with symbols
Definition 5.3
For problem (P), the following problem (D) with interchanged coefficients is called a dual problem
For a dual problem (D), the original problem (P) is called the main problem

5.4.2 Duality Theorem

The dual problem plays an important role not only in nutritional problems but also in general linear programming problems.
Examples
Minimize production cost” and “Purchase price of factory
Theorem 5.4 Duality Theorem
(P)
Minimize
tcx
Constraint
Ax ≥ b
x ≥ 0
(D)
Maximize
tby
Constraint
tAy ≤ c
y ≥ 0
If there exists at least one feasible solution each for the main problem (P) and the dual problem (D)
If there exist optimal solutions x* and y* to (P) and (D), respectively, and
tcx*(minimum value of P) = tby*(maximum value of D)
Interpretation of the duality theorem in nutrition problems
Applying this theorem to the nutrition problem, we obtain
“the maximum value of sales of a pharmaceutical company’s go-to sleeping pills” = “the minimum value of consumers’ food purchases”
Various Principal and Dual Problems
Example

5.4.3 Potential price

Example
A factory makes products 1 and 2.
The price of each product and the quantities of raw materials A and B required are as above
Q1: How many of each product should be made to maximize profit while taking inventory into account?
Let x1 be the number of product 1 and x2 be the number of product 2.
Formulation (P)
Maximize
8×1 + 6×2
Constraint
x1 + x2 ≤ 4
3×1 + x2 ≤ 6
x1, x2 ≥ 0
Applying the unitary method to (P) the solution is (x1, x2) = (1, 3), optimal value 26
Q2: If you could increase the inventory of either raw material A or B by 1 kg, which would increase the maximum profit?
Formulation (P1)
Maximization
8×1 + 6×2
Constraint
x1 + x2 ≤ 4 (+ 1)
Increase inventory by 1 kg
3×1 + x2 ≤ 6 (+ 1)
Increase inventory by 1 kg
x1, x2 ≥ 0
Role of the solution of the dual problem
With (P) as the main problem, create the dual problem (D)
Minimize
4y1 + 6y2
Constraints
y1 + 3y2 ≥ 8
y1 + y2 ≥ 6
X1,x2 ≥ 0
Computing (D) by the unitary method (y1,y2) = (5,1)
The dual problem (D) has the following properties
Increasing the right-hand side of the first constraint equation in (P) by 1 increases the optimal value by 5 (the value of y1).
Increasing the right-hand side of the second constraint equation in (P) by 1 increases the optimal value by 1 (the value of y2).
Increasing the inventory of raw material A by 1 will increase the maximum profit by
Potential Price
Solution y1 of the dual problem represents the hidden value of raw material A
Y1 is called the “potential price” of raw material A

5.5 Farkas’s Corollary

5.5.1 Farkas’ Corollary

Farkas’s lemma
A fundamental theorem on simultaneous linear inequalities
We can use it to prove the duality theorem.
Corollary 5.7 Farkas’ lemma
Geometric meaning of (F1)
Definition 5.8
Geometric meaning of (F2)
Geometric Meaning of Farkas’s Complement

5.5.2 Proof of the duality theorem

Theorem 5.4

5.5.3 Fourier-Motzkin elimination

How to prove Farkas’ complement
Fourier-Motzkin elimination
Algorithm for variable elimination of simultaneous linear inequalities
Projection theorem for convex polyhedra
Incremental number of inequalities

5.5.4 Proof of Farkas’ complement

Deformations of Farkas’ complement
Proof of Farkas’ Complement

Chapter 6. Variational Problems

Overview
In ordinary optimization problems, a point that minimizes a function is obtained from a “set of points
Variational problem is to find a function that minimizes the value of another function, called a functional, from a “set of functions
Example
What line segment has the smallest length among the line segments connecting two points?
The set of line segments connecting two points is the set of functions
The length of a line segment is the value of the functional

6.1 Variational Problems

Overview
The length from x=0 to x=1 of the curve represented by the function y(x)
The length of the graph of a function is a function with a function as a variable
Functional Functions
A function with a function as a variable
Variational Problem
Minimize or maximize a functional
Variational Problem Examples
Example Fastest Line of Descent
What is the shape of the fastest descent of a slide connecting two points?
In the xy-plane with the y-axis pointing down
A mass point moves from (a,o) to (b,B) in the y-axis direction under the force of gravity and repulsion from the slide
What is the shape of the slide that reaches (b,B) the fastest?
Express the shape of the slide as a graph of the function y(x)
Let g be the acceleration due to gravity, the velocity v of a mass m when it drops to y(x) is o=mv2/2 – mgy(x) from the law of conservation of energy
v=√(2gy(x))
The length of the curve in the microinterval is √(1+y'(x)2)∆x
Dividing by the velocity, the time is
The graph of the function that minimizes this functional represents the shape of the fastest slide
Example suspension line
What does a rope tied to a post with both ends at the same height look like hanging down?
The height of both ends is h, the length of the rope is ℓ, and the mass per unit length is m
x-coordinate is horizontal, y-coordinate is vertical
Let the shape of the rope be represented by a graph of the function y(x), where a and b are the coordinates of the two ends.
The rope takes the shape that minimizes one energy
potential energy
is minimized under the condition of length
Example Manpower Planning Problem
We want to develop a workforce plan that minimizes labor costs while maintaining the number of workers needed to handle a certain amount of work.
Let s(x) be the workload at time x and y(x) be the number of employees.
Define labor cost as in the above equation.
Salaries are proportional to the number of employees, so they are linear in y(x).
Hiring and firing costs are high, so they are assumed to be quadratic in the headcount growth rate y'(x).
Optimal headcount planning Translated with www.DeepL.com/Translator (free version)

6.1.1 General Functions

Overview
Simple variational problem
Substitute a concrete function for the functional F and compute the value
When y1(x)=x
When y2(x)=x2
F(y2)>F(y1)
What function y(x) will produce the best value of F(y)? Variational problem
Finding the minimal solution heuristically
Using properties related to integration
The value of the functional F is greater than or equal to zero for all y(x)
The y(x) such that the value of F is zero is the minimum
Let ȳ(x)=1 (constant function)
F(ȳ(x))=0
Minimum
ȳ(x)=1 is the smallest solution of the order of magnitude
Functional Functions Dependent on the Derivative of the Function
Problem
Note that y'(x) is in the integral, the integration interval is [1,2], and the constraint
greater than or equal to 0 for all F
Find a function f(x) that satisfies the constraint such that the value of F is zero
To set the value of F to 0
We only need y'(x)=1
Constraints: y(1)=2, y(2)=3
If ȳ(x)=x+1, then everything is satisfied
Minimum solution
General solution to variational problem
General form
Minimize
F(y)
Objective function
Constraint
y ∈ C
The function y(x) is in the set C of functions
Definition 6.4.
A function ȳ(x) ∈ C satisfies F(y) ≥ F(ȳ) for all functions y(x) ∈ C if
ȳ(x) is called the “global minimum solution” of the problem.
Definition 6.5
A function ȳ(x) ∈ C satisfies F(y) ≥ F(ȳ) for all y(x) ∈ C that are sufficiently “close” to ȳ(x) if
ȳ(x) is called the “local minimum solution” of the problem
Function Closeness
A function that is “close” to the function ȳ(x) is
A function whose graph is close to that of ȳ(x)
Example
For function v(x) and sufficiently small ε
Consider Ȳ + εv(x)
The graph of ȳ(x) is a function “close” to ȳ(x) because ȳ(x) has a slightly modified graph
Norm
One way to mathematically define the proximity of a function
Norm
The nature of the problem depends on which norm is used to measure the proximity of the function
A function is “close” if it satisfies the constraint
Given the constraints y(0)=0, y(1)=1
Integral function
Functional function for a 3-variable function f(x,y,z)
Substitute y(x) for the y variable and y'(x) for the z variable of f(x,y,z)
f is called the “function under integration
Example
The objective function
The integration function of f(x,y,z)=(y-1)2
The objective function
is f(x,y,z) = y +1/2z2
Abbreviation for the functional
Write f(x,y(x),y'(x))=f[y(x)
Use “[ ]” in parentheses for functions
F(y) is

6.1.2 Directional Differentiation

Use the derivative of the functional to find the optimal solution, rather than finding it by guesswork
Defined as “the rate at which the functional changes when the function is changed slightly (instantaneous rate of change)
Amount of change in the value of the functional
What does it mean to “slightly change” the function y(x)?
For a function v(x) and a small number ε
y(x) + εv(x).
How does the functional change?
The amount of change of the functional
F(y + εv) – F(y)
For the variables in the functional
F(y) ⟷ y(x)
F(y + εv) ⟷ y(x) + εv(x)
Variation of the functional for a concrete function
Example
y(x)=x2
v(x)=x3
y(x) + εv(x) = x2 + εx3
F(y+εv)-F(y) = ∫(x2 + εx3)2dx – ∫(x2)2dx = ∫(x4 + 2εx5 + ε2×6 -x4)dx =1/3ε + 1/7ε2
Change in the functional when the function y(x) changes from x2 to x2 + εx3
Pseudo rate of change of the functional
To simplify the calculation, we use a pseudo average rate of change that considers only the amount of change by dividing both sides by ε
The average rate of change in the v-direction is the amount of change in the v-direction of the functional F / the amount of change in ε
(F(y+εv)-F(y))/ε=1/3 + 1/7ε
Average rate of change in v-direction
Limit of ε → 0
Definition 6.7
Image of the directional derivative
Example
Directional Differentiation
Calculation of the amount of change
Average rate of change in direction v
When ε → 0
Formula for the directional derivative
Proposition 6.8
For the functions y(x) and v(x)
Φ(ε)=F(y+εv) (one variable function)
(F(y+εv) – F(y)) / ε = (Φ(ε) – Φ(0)) / ε
The directional derivative is
Corollary.
When the function f(x,y) integral of the functional F has no z-variable
Examples of directional derivatives
Example 1
The functional function
f(x,y,z)=y2
fy=2y, fz=0
Directional derivative
Example 2
Functional Functions
f(x,y,z)=y+1/2z2
fy=1, fz=z
Directional derivative

6.1.3 Convex functions

Consider convexity with respect to a functional
Definition 6.10.
Let F be a functional.
For any function y(x),v(x)
F(y+v)≥F(y) + DF(y)(v)
F is called a “convex functional
Convex Functional Diagram
Example
The above functional is a convex functional
Conditions for a Functional Function to be Convex
General Convexity Determination Methods
Definition 6.12
For a 3-variable function f(x,y,z), consider x to be a constant
Let g(y,z) = f(x,y,z) be a function of (y,z)
When g(y,z) is a convex function for all x
f(x,y,z) is convex with respect to the second and third variables.
Proposition 6.13.
For a three-variable function f(x,y,z)
Theorem 6.14.
Functional Functions
For any x ∈ [a, b
If the integral function f(x,y,z9) is convex with respect to the second and third variables
the functional F is also convex
Examples of Convex Functionals
Example 1
Functional Functions
Since the integration function f(x,y,z)=x + y2 + z2 is convex, F is a convex function
Example 2
Functional Functions
The integration function f(x,y,z)=-x2 + y2 + z2 is not convex with respect to (x,y,z)
It is convex with respect to the second and third variables
Therefore, it is a convex functional

6.2 Optimality conditions for variational problems (Euler-Lagrange equations)

Outline
Basic equations for variational problems
Problems with the constraints y(a)=A, y(b)=B for a function y
Fixed-edge problems
Example Fastest descent problem
Equation
Explained in order of easiest to discuss
Sufficient conditions for globally optimal solution of a functional
Necessary conditions for locally optimal solutions of a functional
Second Formula of Directional Differentiation
Express the formula for the directional derivative as a preparation
Corollary 6.17. Second Formula for the Directional Differentiation
For a functional
The directional derivative can be expressed by the above formula

6.2.1 Optimality Sufficiency Condition for Convex Functionals

When the objective function is a convex functional, we obtain sufficient conditions for a globally optimal solution
Theorem 6.18.
Residual function
Definition 6.19

6.2.2 Solution examples
6.2.3 Optimality Requisites for General Functionals
6.2.4 Solution examples

Chapter 7. Constrained Variational Problems

7.1 Constrained Variational Problems

7.1.1 Consideration of optimality conditions
7.1.2 Optimality conditions for constrained variational problems
7.1.3 Solution Examples
7.1.4 Optimality Sufficiency Conditions with Convexity
7.1.5 Linear Functionals

7.2 Solutions to Famous Variational Problems

7.2.1 Fastest descent method
7.2.2 Suspension Translated with www.DeepL.com/Translator (free version)

Chapter 8 Computer use

8.1 Basic Operations

8.1.1 Drawing a graph
8.1.2 Calculating differential and integral calculus
8.1.3 Matrix calculations

8.2 Solving optimization problems

8.2.1 Solving unconstrained optimization problems
8.2.2 Solving constrained optimization problems
8.2.3 Solving linear programming problems

8.3 Solving Variational Problems

When solving variational problems on a computer, there are two methods: deriving the Euler-Lagrange equation and solving the differential equation, and directly obtaining the approximate times.

8.3.1 Variational Problem Solving

8.3.1.1 Solving the Euler-Lagrange equation
8.3.1.2 Ritz’s method
8.3.1.3 How to approximate a variational problem by an optimization problem on a number vector space

8.3.2 Solving Constrained Variational Problems

8.3.2.1 Solving the Euler-Lagrange equations

Chapter 9: Solutions to Exercises