Notes on Optimization

David K. Zhang
Last Modified 2024-01-01

One-Dimensional Optimization

In this section, we consider the problem of minimizing a continuous function f:RRf: \R \to \R. More specifically, we want to devise an algorithm to search for a minimizer xx^* of ff starting from one or more specified initial points. Different types of initial data (three-point brackets, two-point brackets, etc.) will lead to different classes of algorithms.

We will not assume any particular form for the function ff, and we will develop algorithms that only access ff as a black box. In other words, given a real number xRx \in \R, we can call a subroutine that returns the numerical value of f(x)f(x). This output value is the only feature of the subroutine that we can observe; we cannot otherwise look inside to see what it is doing.

Under these conditions, any algorithm that terminates can only perform a finite number of evaluations of ff. Hence, no algorithm of this type can provide any guarantees about finding a global minimizer or computing an exact result. Even in the one-dimensional case, the best we can hope to do is to approximate a local minimizer of ff.

Bracketing and Interpolation

Suppose we have a three-point bracket for ff, i.e., an ordered triple of real numbers x0,x1,x2Rx_0, x_1, x_2 \in \R such that

x0<x1<x2andf(x0)f(x1)f(x2). x_0 < x_1 < x_2 \quad \text{and} \quad f(x_0) \ge f(x_1) \le f(x_2).

Using this data, we can construct a quadratic interpolating polynomial pR[x]p \in \R[x] that passes through the points (x0,f(x0))(x_0, f(x_0)), (x1,f(x1))(x_1, f(x_1)), and (x2,f(x2))(x_2, f(x_2)).

p(x)Δ2Δ1h1+h2(xx0)2h1Δ2h2Δ12h1Δ1h1+h2(xx0)+f0 p(x) \coloneqq \frac{\Delta_2 - \Delta_1}{h_1 + h_2} (x - x_0)^2 - \frac{h_1 \Delta_2 - h_2 \Delta_1 - 2 h_1 \Delta_1}{h_1 + h_2} (x - x_0) + f_0

Here, we have defined h1h_1, h2h_2, Δ1\Delta_1, and Δ2\Delta_2 as follows:

h1x1x0>0h2x2x1>0 h_1 \coloneqq x_1 - x_0 > 0 \qquad h_2 \coloneqq x_2 - x_1 > 0 Δ1f(x1)f(x0)x1x00Δ2f(x2)f(x1)x2x10 \Delta_1 \coloneqq \frac{f(x_1) - f(x_0)}{x_1 - x_0} \le 0 \qquad \Delta_2 \coloneqq \frac{f(x_2) - f(x_1)}{x_2 - x_1} \ge 0

Now, if ff is well-approximated by pp in the interval [x0,x2][x_0, x_2], then we might expect a minimizer of pp to closely approximate a minimizer of ff. There is a simple formula for the exact (global) minimizer of pp:

xquadratic=x0+h1Δ2h2Δ12h1Δ12(Δ2Δ1) x_{\text{quadratic}}^* = x_0 + \frac{h_1 \Delta_2 - h_2 \Delta_1 - 2 h_1 \Delta_1}{2(\Delta_2 - \Delta_1)}

Observe that xquadraticx_{\text{quadratic}}^* is well-defined as long as Δ1\Delta_1 and Δ2\Delta_2 are not both zero. Indeed, if Δ1=Δ2=0\Delta_1 = \Delta_2 = 0, then this technique breaks down because pp is a constant polynomial. Moreover, if the bracket conditions x0<x1<x2x_0 < x_1 < x_2 and f(x0)f(x1)f(x2)f(x_0) \ge f(x_1) \le f(x_2) are satisfied, then xquadraticx_{\text{quadratic}}^* is guaranteed to lie in the interval [x0,x2][x_0, x_2], and p(xquadratic)0p''(x_{\text{quadratic}}^*) \le 0.

(x0,xquadratic,x1)(x_0, x_{\text{quadratic}}^*, x_1)

(x1,xquadratic,x2)(x_1, x_{\text{quadratic}}^*, x_2)

(x0,x1,xquadratic)(x_0, x_1, x_{\text{quadratic}}^*)

(xquadratic,x1,x2)(x_{\text{quadratic}}^*, x_1, x_2)

The following Mathematica code verifies the correctness of these claims.


p[x_] := (Δ₂ - Δ₁)/(h₁ + h₂)*(x - x₀)^2 -
    (h₁*Δ₂ - h₂*Δ₁ - 2h₁*Δ₁)/(h₁ + h₂)*(x - x₀) + f₀;

xQuadratic = x₀ + (h₁*Δ₂ - h₂*Δ₁ - 2h₁*Δ₁)/(2Δ₂ - 2Δ₁);

Assert[p[x₀] == f₀ // Simplify];
Assert[p[x₀ + h₁] == f₀ + h₁*Δ₁ // Simplify];
Assert[p[x₀ + h₁ + h₂] == f₀ + h₁*Δ₁ + h₂*Δ₂ // Simplify];
Assert[p'[xQuadratic] == 0 // Simplify];

Assuming[h₁ > 0 && h₂ > 0 && Δ₁ <= 0 && Δ₂ >= 0,
    Assert[p''[xQuadratic] >= 0 // Simplify];
    Assert[x₀ <= xQuadratic // Simplify];
    Assert[xQuadratic <= x₀ + h₁ + h₂ // Simplify];