Notes on Convex Analysis

David K. Zhang
Last Modified 2023-12-31

In these notes, we will develop a generalized form of convex analysis for vector spaces with an arbitrary (possibly infinite) number of dimensions over any ordered field (not just $\Q$ or $\R$ ).

Definition: Unit Interval, $[0_F, 1_F]$ , Open Unit Interval, $(0_F, 1_F)$

Let $F$ be an ordered field. The unit interval in $F$ , denoted by $[0_F, 1_F]$ , is the set

[0_F, 1_F] \coloneqq \{ t \in F: 0_F \le t \le 1_F \}

where $0_F$ and $1_F$ denote the additive and multiplicative identity elements of $F$ , respectively. Similarly, the open unit interval in $F$ , denoted by $(0_F, 1_F)$ , is the set

(0_F, 1_F) \coloneqq \{ t \in F: 0_F < t < 1_F \}.

Whenever we work in an unspecified ordered field $F$ , we will refer to the additive and multiplicative identity elements of $F$ as $0_F$ and $1_F$ , respectively, to remind ourselves that these may not be real numbers.

Definition: Line Segment, $[\vv, \vw]$ , $[\vv, t, \vw]$

Let $V$ be a vector space over an ordered field $F$ . Given two vectors $\vv, \vw \in V$ , the line segment connecting $\vv$ to $\vw$ is the set

[\vv, \vw] \coloneqq \{ t \vv + (1_F - t) \vw : t \in [0_F, 1_F] \}.

We also define the notation

[\vv, t, \vw] \coloneqq t \vv + (1_F - t) \vw

to refer to a particular point along this line segment.

Although the notation $[\vv, t, \vw]$ is well-defined for all values of $t \in F$ , we will only write $[\vv, t, \vw]$ when $0_F \le t \le 1_F$ .

Definition: Convex Set

Let $V$ be a vector space over an ordered field $F$ . A subset $C \subseteq V$ is convex if, for all pairs of points $\vv, \vw \in C$ , the line segment $[\vv, \vw]$ is contained in $C$ .

It immediately follows from this definition that:

Any singleton subset of $V$ is a convex set.
Any linear subspace of $V$ , including $V$ itself, is a convex set.
The intersection of any number of convex subsets of $V$ is a convex set.

Definition: Convex Function

Let $V$ be a vector space over an ordered field $F$ , and let $C \subseteq V$ be convex. A function $f: C \to F$ is convex if, for all $\vv, \vw \in C$ and $t \in [0_F, 1_F]$ , we have

f([\vv, t, \vw]) \le [f(\vv), t, f(\vw)].

On the right-hand side of this inequality, we regard $F$ itself as a vector space over $F$ in order to write $[f(\vv), t, f(\vw)]$ . Note that the domain of a convex function is required to be a convex set in order for $f([\vv, t, \vw])$ to be well-defined.

Definition: Strictly Convex Function

Let $V$ be a vector space over an ordered field $F$ , and let $C \subseteq V$ be convex. A function $f: C \to F$ is strictly convex if, for all $\vv, \vw \in C$ and $t \in (0_F, 1_F)$ , we have

f([\vv, t, \vw]) < [f(\vv), t, f(\vw)].

Definition: $\mu$ -Strongly Convex Function

Let $V$ be an inner product space over an ordered field $F$ , let $C \subseteq V$ be convex, and let $\mu \in F$ be positive ( $\mu > 0_F$ ). A function $f: C \to F$ is $\mu$ -strongly convex if, for all $\vv, \vw \in C$ and $t \in [0_F, 1_F]$ , we have

f([\vv, t, \vw]) \le [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle.

Note that division by $2_F$ is well-defined because an ordered field cannot have positive characteristic.

Norm of Midpoint vs. Midpoint of Norms

Theorem: Let $V$ be a vector space over a field $F$ , and let $\langle \cdot, \cdot \rangle: V \times V \to F$ be a bilinear form on $V$ . (We do not assume $\langle \cdot, \cdot \rangle$ to be symmetric or positive-definite.) For any $\vv, \vw \in V$ and $t \in F$ , we have

[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle = t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle.

If, in addition, $F$ is an ordered field, $\langle \cdot, \cdot \rangle$ is positive-semidefinite, and $0_F \le t \le 1_F$ , then

[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] \ge \langle [\vv, t, \vw], [\vv, t, \vw] \rangle.

Proof: Let $\vv, \vw \in V$ and $t \in F$ be given. We proceed by direct computation:

\begin{aligned} &[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle \\ &= t \langle \vv, \vv \rangle + (1_F - t) \langle \vw, \vw \rangle - \langle t \vv + (1_F - t) \vw, t \vv + (1_F - t) \vw \rangle \\ &= t \langle \vv, \vv \rangle + (1_F - t) \langle \vw, \vw \rangle - t^2 \langle \vv, \vv \rangle - t (1_F - t) \langle \vv, \vw \rangle - t (1_F - t) \langle \vw, \vv \rangle - (1_F - t)^2 \langle \vw, \vw \rangle \\ &= t (1_F - t) \langle \vv, \vv \rangle + t (1_F - t) \langle \vw, \vw \rangle - t (1_F - t) \langle \vv, \vw \rangle - t (1_F - t) \langle \vw, \vv \rangle \\ &= t (1_F - t) [\langle \vv, \vv \rangle - \langle \vv, \vw \rangle - \langle \vw, \vv \rangle + \langle \vw, \vw \rangle] \\ &= t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle \end{aligned}

This establishes the desired equation. Now observe that if $\langle \cdot, \cdot \rangle$ is positive-semidefinite and $0_F \le t \le 1_F$ , then the right-hand side $t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle$ is a product of three nonnegative quantities. This establishes the desired inequality. ∎

As a direct corollary of this result, it follows that the squared-norm function $\vx \mapsto \langle \vx, \vx \rangle$ is $2_F$ -strongly convex, or equivalently, that the scaled function $\vx \mapsto \frac{1_F}{2_F} \langle \vx, \vx \rangle$ is $1_F$ -strongly convex. It turns out that the squared-norm function completely characterizes the notion of $\mu$ -strong convexity, in the sense that the $\mu$ -strong convexity of any function can be determined by comparison to $\vx \mapsto \frac{\mu}{2_F} \langle \vx, \vx \rangle$ .

Characterizing $\mu$ -Strong Convexity

Theorem: Let $V$ be an inner product space over an ordered field $F$ , let $C \subseteq V$ be convex, and let $\mu \in F$ be positive. A function $f: C \to F$ is $\mu$ -strongly convex if and only if the function $g: C \to F$ defined by

g(\vx) \coloneqq f(\vx) - \frac{\mu}{2_F} \langle \vx, \vx \rangle

is convex.

Proof: By definition, $g$ is convex if and only if the following inequality holds:

\begin{aligned} g([\vv, t, \vw]) &\le [g(\vv), t, g(\vw)] \\ f([\vv, t, \vw]) - \frac{\mu}{2_F} \langle [\vv, t, \vw], [\vv, t, \vw] \rangle &\le [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] \end{aligned}

After rearranging this inequality, we can simplify it using our formula for $[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle$ :

\begin{aligned} f([\vv, t, \vw]) &\le [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} \big( [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle \big) \\ &= [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle \end{aligned}

This is precisely the definition of $\mu$ -strong convexity for $f$ . ∎

Definition: Subgradient, Subdifferential, $\partial f(\vv)$

Let $V$ be a vector space over an ordered field $F$ , let $D \subseteq V$ , and let $f: D \to F$ . A subgradient of $f$ at a point $\vv \in D$ is a linear functional $g \in V^*$ having the property that

f(\vw) - f(\vv) \ge g(\vw - \vv)

for all $\vw \in D$ . The set of all subgradients of $f$ at a point $\vv \in D$ , denoted by $\partial f(\vv)$ , is called the subdifferential of $f$ at $\vv$ .

\partial f(\vv) \coloneqq \{ g \in V^* : \forall \vw \in D,\ f(\vw) - f(\vv) \ge g(\vw - \vv) \}

Definition: Transpose, $\vg^T$ , Representable Linear Functional, Representable Dual Space, $V^T$

Let $V$ be an inner product space over an ordered field $F$ , and let $\vg \in V$ be a vector. The transpose of $\vg$ is the linear functional $\vg^T: V \to F$ defined by

\vg^T(\vx) \coloneqq \langle \vg, \vx \rangle.

A linear functional $g: V \to F$ is representable if there exists a vector $\vg \in V$ such that $g = \vg^T$ . The set of all representable linear functionals on $V$ , denoted by $V^T$ , is called the representable dual space of $V$ .

V^T \coloneqq \{ \vg^T : \vg \in V \} = \{ g \in V^* : \exists \vg \in V,\ \forall \vx \in V,\ g(\vx) = \langle \vg, \vx \rangle \}

The representable dual space $V^T$ is a linear subspace of the full dual space $V^*$ , and they coincide if and only if $V$ is finite-dimensional.

Subdifferentials are Convex Sets

Theorem: Let $V$ be a vector space over an ordered field $F$ , let $D \subseteq V$ , and let $f: D \to F$ . For any $\vv \in D$ , the subdifferential $\partial f(\vv)$ is a convex subset of the dual space $V^*$ .

Proof: Let $g_1, g_2 \in \partial f(\vv)$ be given. To show that $\partial f(\vv)$ is convex, we must prove that the line segment $[g_1, g_2]$ is contained in $\partial f(\vv)$ . By hypothesis, for all $\vw \in D$ , we have

f(\vw) - f(\vv) \ge g_1(\vw - \vv)

and

f(\vw) - f(\vv) \ge g_2(\vw - \vv).

Let $t \in [0_F, 1_F]$ be given. By multiplying the first equation by $t$ and the second equation by $1_F - t$ , we obtain

t f(\vw) - t f(\vv) \ge t g_1(\vw - \vv)

and

(1_F - t) f(\vw) - (1_F - t) f(\vv) \ge (1_F - t) g_2(\vw - \vv).

By adding these two inequalities together, we obtain

f(\vw) - f(\vv) \ge [t g_1 + (1_F - t) g_2](\vw - \vv)

which proves that $t g_1 + (1_F - t) g_2 = [g_1, t, g_2] \in \partial f(\vv)$ . Since $t \in [0_F, 1_F]$ was arbitrary, this proves that the line segment $[g_1, g_2]$ is contained in $\partial f(\vv)$ . ∎

Subgradient of Squared Norm

Theorem: Let $V$ be an inner product space over an ordered field $F$ , and consider the function $f: V \to F$ defined by $f(\vx) \coloneqq \frac{1_F}{2_F} \langle \vx, \vx \rangle$ . At any point $\vx \in V$ , the unique representable subgradient of $f$ at $\vx$ is $\vx^T$ .

\forall \vx \in V,\ \partial f(\vx) \cap V^T = \{ \vx^T \}

Proof: Let $\vg, \vx \in V$ be given. By definition, $\vg^T$ is a subdifferential of $f$ at $\vx$ if and only if

\frac{1_F}{2_F} \langle \vy, \vy \rangle - \frac{1_F}{2_F} \langle \vx, \vx \rangle \ge \langle \vg, \vy - \vx \rangle

for all $\vy \in V$ . We can rearrange this inequality into the equivalent form

\langle \vy - 2_F \vg, \vy \rangle - \langle \vx - 2_F \vg, \vx \rangle \ge 0_F.

Since this inequality must hold for all values of $\vy$ , in particular, it must hold when $\vy = \vg$ . Setting $\vy$ to $\vg$ simplifies the left-hand side of the inequality as follows:

\begin{aligned} \langle \vg - 2_F \vg, \vg \rangle - \langle \vx - 2_F \vg, \vx \rangle &= -\langle \vg, \vg \rangle - \langle \vx, \vx \rangle + 2_F \langle \vg, \vx \rangle \\ &= -\langle \vg - \vx, \vg - \vx \rangle \end{aligned}

It is impossible for the inequality $-\langle \vg - \vx, \vg - \vx \rangle \ge 0$ to hold whenever $\vg \ne \vx$ . This proves that $\vg^T \notin \partial f(\vx)$ whenever $\vg \ne \vx$ . To prove that converse, that $\vx^T \in \partial f(\vx)$ , we must show that

\frac{1_F}{2_F} \langle \vy, \vy \rangle - \frac{1_F}{2_F} \langle \vx, \vx \rangle \ge \langle \vx, \vy - \vx \rangle

for all $\vy \in V$ . We can simplify this inequality into the following equivalent forms:

\begin{aligned} \frac{1_F}{2_F} \langle \vy, \vy \rangle - \frac{1_F}{2_F} \langle \vx, \vx \rangle &\ge \langle \vx, \vy \rangle - \langle \vx, \vx \rangle \\ \frac{1_F}{2_F} \langle \vy, \vy \rangle + \frac{1_F}{2_F} \langle \vx, \vx \rangle &\ge \langle \vx, \vy \rangle \\ \frac{1_F}{2_F} \langle \vy - \vx, \vy - \vx \rangle &\ge 0_F \end{aligned}

It follows from the positive-definiteness of the inner product that this inequality holds for all $\vy \in V$ . ∎

We now introduce a notion that is dual to $\mu$ -strong convexity.

Definition: $L$ -Controlled Function

Let $V$ be an inner product space over an ordered field $F$ , let $C \subseteq V$ be convex, and let $L \in F$ be positive ( $L > 0_F$ ). A function $f: C \to F$ is $L$ -controlled if, for all $\vv, \vw \in C$ and $t \in [0_F, 1_F]$ , we have

f([\vv, t, \vw]) \ge [f(\vv), t, f(\vw)] - \frac{L}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle.

The property of being $L$ -controlled becomes more restrictive as the value of $L$ decreases. For example, being $1_F$ -controlled is a stronger condition than being $2_F$ -controlled.

Characterizing $L$ -Control

Theorem: Let $V$ be an inner product space over an ordered field $F$ , let $C \subseteq V$ be convex, and let $L \in F$ be positive. A function $f: C \to F$ is $L$ -controlled if and only if the function $g: C \to F$ defined by

g(\vx) \coloneqq \frac{L}{2_F} \langle \vx, \vx \rangle - f(\vx)

is convex.

Proof: By definition, $g$ is convex if and only if the following inequality holds:

\begin{aligned} g([\vv, t, \vw]) &\le [g(\vv), t, g(\vw)] \\ \frac{L}{2_F} \langle [\vv, t, \vw], [\vv, t, \vw] \rangle - f([\vv, t, \vw]) &\le \frac{L}{2_F} [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - [f(\vv), t, f(\vw)] \end{aligned}

After rearranging this inequality, we can simplify it using our formula for $[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle$ :

\begin{aligned} f([\vv, t, \vw]) &\ge [f(\vv), t, f(\vw)] - \frac{L}{2_F} \big( [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle \big) \\ &= [f(\vv), t, f(\vw)] - \frac{L}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle \end{aligned}

This is precisely the definition of $f$ being $L$ -controlled. ∎

Let $V$ be an inner product space over an ordered field $F$ . Let $L \in F$ be a positive constant, and let $f: V \to F$ be an $L$ -controlled convex function. This means that both $f$ and the function $\gamma: V \to F$ defined by

\gamma(\vx) \coloneqq \frac{L}{2_F} \langle \vx, \vx \rangle - f(\vx)

are convex. Let $\vx \in V$ be given, and suppose that both $f$ and $\gamma$ have representable subgradients $\vg^T \in \partial f(\vx)$ and $\vh^T \in \partial \gamma(\vx)$ . It follows that $\vg^T + \vh^T$ is a representable subgradient of $(f + \gamma)(\vx) = \frac{L}{2_F} \langle \vx, \vx \rangle$ , which implies that $\vg^T + \vh^T = L\vx^T$ . By definition, for all $\vy \in V$ , we have the inequality

\left[ \frac{L}{2_F} \langle \vy, \vy \rangle - f(\vy) \right] - \left[ \frac{L}{2_F} \langle \vx, \vx \rangle - f(\vx) \right] \ge \vh^T(\vy - \vx)

which can be rearranged into the equivalent form

f(\vy) - f(\vx) \le \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] - \vh^T(\vy - \vx).

Since $\vg^T + \vh^T = L\vx^T$ , we can write

\begin{aligned} f(\vy) - f(\vx) &\le \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] - \vh^T(\vy - \vx) \\ &\le \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] - \vh^T(\vy - \vx) + \left( \vg^T + \vh^T \right) (\vy - \vx) - L\vx^T(\vy - \vx) \\ &= \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] + \vg^T (\vy - \vx) - L \langle \vx, \vy - \vx \rangle \\ &= \vg^T (\vy - \vx) + \frac{L}{2_F} \langle \vy - \vx, \vy - \vx \rangle \end{aligned}

which proves that

f(\vy) \le f(\vx) + \vg^T (\vy - \vx) + \frac{L}{2_F} \langle \vy - \vx, \vy - \vx \rangle

for all $\vy \in V$ . In particular, for any $\alpha \in F$ , we can set $\vy = \vx - \alpha \vg$ to obtain the inequality

f(\vx - \alpha \vg) \le f(\vx) - \alpha \left( 1_F - \frac{L \alpha}{2_F} \right) \langle \vg, \vg \rangle.

This inequality shows that a gradient descent step is guaranteed to decrease any convex objective function $f$ , as long as:

$f$ is $L$ -controlled for some positive constant $L \in F$ ;
the step size $\alpha$ is strictly between $0_F$ and $2_F/L$ ;
both $f$ and $\gamma$ have representable subgradients at the current point $\vx$ ;
the subgradient $\vg^T \in \partial f(\vx)$ used to take the step is not zero.