Notes on Convex Analysis

David K. Zhang
Last Modified 2023-12-31

In these notes, we will develop a generalized form of convex analysis for vector spaces with an arbitrary (possibly infinite) number of dimensions over any ordered field (not just Q\Q or R\R).

Definition: Unit Interval, [0F,1F][0_F, 1_F], Open Unit Interval, (0F,1F)(0_F, 1_F)

Let FF be an ordered field. The unit interval in FF, denoted by [0F,1F][0_F, 1_F], is the set

[0F,1F]{tF:0Ft1F} [0_F, 1_F] \coloneqq \{ t \in F: 0_F \le t \le 1_F \}

where 0F0_F and 1F1_F denote the additive and multiplicative identity elements of FF, respectively. Similarly, the open unit interval in FF, denoted by (0F,1F)(0_F, 1_F), is the set

(0F,1F){tF:0F<t<1F}. (0_F, 1_F) \coloneqq \{ t \in F: 0_F < t < 1_F \}.

Whenever we work in an unspecified ordered field FF, we will refer to the additive and multiplicative identity elements of FF as 0F0_F and 1F1_F, respectively, to remind ourselves that these may not be real numbers.

Definition: Line Segment, [v,w][\vv, \vw], [v,t,w][\vv, t, \vw]

Let VV be a vector space over an ordered field FF. Given two vectors v,wV\vv, \vw \in V, the line segment connecting v\vv to w\vw is the set

[v,w]{tv+(1Ft)w:t[0F,1F]}. [\vv, \vw] \coloneqq \{ t \vv + (1_F - t) \vw : t \in [0_F, 1_F] \}.

We also define the notation

[v,t,w]tv+(1Ft)w [\vv, t, \vw] \coloneqq t \vv + (1_F - t) \vw

to refer to a particular point along this line segment.

Although the notation [v,t,w][\vv, t, \vw] is well-defined for all values of tFt \in F, we will only write [v,t,w][\vv, t, \vw] when 0Ft1F0_F \le t \le 1_F.

Definition: Convex Set

Let VV be a vector space over an ordered field FF. A subset CVC \subseteq V is convex if, for all pairs of points v,wC\vv, \vw \in C, the line segment [v,w][\vv, \vw] is contained in CC.

It immediately follows from this definition that:

Definition: Convex Function

Let VV be a vector space over an ordered field FF, and let CVC \subseteq V be convex. A function f:CFf: C \to F is convex if, for all v,wC\vv, \vw \in C and t[0F,1F]t \in [0_F, 1_F], we have

f([v,t,w])[f(v),t,f(w)]. f([\vv, t, \vw]) \le [f(\vv), t, f(\vw)].

On the right-hand side of this inequality, we regard FF itself as a vector space over FF in order to write [f(v),t,f(w)][f(\vv), t, f(\vw)]. Note that the domain of a convex function is required to be a convex set in order for f([v,t,w])f([\vv, t, \vw]) to be well-defined.

Definition: Strictly Convex Function

Let VV be a vector space over an ordered field FF, and let CVC \subseteq V be convex. A function f:CFf: C \to F is strictly convex if, for all v,wC\vv, \vw \in C and t(0F,1F)t \in (0_F, 1_F), we have

f([v,t,w])<[f(v),t,f(w)]. f([\vv, t, \vw]) < [f(\vv), t, f(\vw)].

Definition: μ\mu-Strongly Convex Function

Let VV be an inner product space over an ordered field FF, let CVC \subseteq V be convex, and let μF\mu \in F be positive (μ>0F\mu > 0_F). A function f:CFf: C \to F is μ\mu-strongly convex if, for all v,wC\vv, \vw \in C and t[0F,1F]t \in [0_F, 1_F], we have

f([v,t,w])[f(v),t,f(w)]μ2Ft(1Ft)vw,vw. f([\vv, t, \vw]) \le [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle.

Note that division by 2F2_F is well-defined because an ordered field cannot have positive characteristic.

Norm of Midpoint vs. Midpoint of Norms

Theorem: Let VV be a vector space over a field FF, and let ,:V×VF\langle \cdot, \cdot \rangle: V \times V \to F be a bilinear form on VV. (We do not assume ,\langle \cdot, \cdot \rangle to be symmetric or positive-definite.) For any v,wV\vv, \vw \in V and tFt \in F, we have

[v,v,t,w,w][v,t,w],[v,t,w]=t(1Ft)vw,vw. [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle = t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle.

If, in addition, FF is an ordered field, ,\langle \cdot, \cdot \rangle is positive-semidefinite, and 0Ft1F0_F \le t \le 1_F, then

[v,v,t,w,w][v,t,w],[v,t,w].[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] \ge \langle [\vv, t, \vw], [\vv, t, \vw] \rangle.

Proof: Let v,wV\vv, \vw \in V and tFt \in F be given. We proceed by direct computation:

[v,v,t,w,w][v,t,w],[v,t,w]=tv,v+(1Ft)w,wtv+(1Ft)w,tv+(1Ft)w=tv,v+(1Ft)w,wt2v,vt(1Ft)v,wt(1Ft)w,v(1Ft)2w,w=t(1Ft)v,v+t(1Ft)w,wt(1Ft)v,wt(1Ft)w,v=t(1Ft)[v,vv,ww,v+w,w]=t(1Ft)vw,vw \begin{aligned} &[\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle \\ &= t \langle \vv, \vv \rangle + (1_F - t) \langle \vw, \vw \rangle - \langle t \vv + (1_F - t) \vw, t \vv + (1_F - t) \vw \rangle \\ &= t \langle \vv, \vv \rangle + (1_F - t) \langle \vw, \vw \rangle - t^2 \langle \vv, \vv \rangle - t (1_F - t) \langle \vv, \vw \rangle - t (1_F - t) \langle \vw, \vv \rangle - (1_F - t)^2 \langle \vw, \vw \rangle \\ &= t (1_F - t) \langle \vv, \vv \rangle + t (1_F - t) \langle \vw, \vw \rangle - t (1_F - t) \langle \vv, \vw \rangle - t (1_F - t) \langle \vw, \vv \rangle \\ &= t (1_F - t) [\langle \vv, \vv \rangle - \langle \vv, \vw \rangle - \langle \vw, \vv \rangle + \langle \vw, \vw \rangle] \\ &= t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle \end{aligned}

This establishes the desired equation. Now observe that if ,\langle \cdot, \cdot \rangle is positive-semidefinite and 0Ft1F0_F \le t \le 1_F, then the right-hand side t(1Ft)vw,vwt (1_F - t) \langle \vv - \vw, \vv - \vw \rangle is a product of three nonnegative quantities. This establishes the desired inequality.

As a direct corollary of this result, it follows that the squared-norm function xx,x\vx \mapsto \langle \vx, \vx \rangle is 2F2_F-strongly convex, or equivalently, that the scaled function x1F2Fx,x\vx \mapsto \frac{1_F}{2_F} \langle \vx, \vx \rangle is 1F1_F-strongly convex. It turns out that the squared-norm function completely characterizes the notion of μ\mu-strong convexity, in the sense that the μ\mu-strong convexity of any function can be determined by comparison to xμ2Fx,x\vx \mapsto \frac{\mu}{2_F} \langle \vx, \vx \rangle.

Characterizing μ\mu-Strong Convexity

Theorem: Let VV be an inner product space over an ordered field FF, let CVC \subseteq V be convex, and let μF\mu \in F be positive. A function f:CFf: C \to F is μ\mu-strongly convex if and only if the function g:CFg: C \to F defined by

g(x)f(x)μ2Fx,x g(\vx) \coloneqq f(\vx) - \frac{\mu}{2_F} \langle \vx, \vx \rangle

is convex.


Proof: By definition, gg is convex if and only if the following inequality holds:

g([v,t,w])[g(v),t,g(w)]f([v,t,w])μ2F[v,t,w],[v,t,w][f(v),t,f(w)]μ2F[v,v,t,w,w] \begin{aligned} g([\vv, t, \vw]) &\le [g(\vv), t, g(\vw)] \\ f([\vv, t, \vw]) - \frac{\mu}{2_F} \langle [\vv, t, \vw], [\vv, t, \vw] \rangle &\le [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] \end{aligned}

After rearranging this inequality, we can simplify it using our formula for [v,v,t,w,w][v,t,w],[v,t,w][\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle:

f([v,t,w])[f(v),t,f(w)]μ2F([v,v,t,w,w][v,t,w],[v,t,w])=[f(v),t,f(w)]μ2Ft(1Ft)vw,vw \begin{aligned} f([\vv, t, \vw]) &\le [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} \big( [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle \big) \\ &= [f(\vv), t, f(\vw)] - \frac{\mu}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle \end{aligned}

This is precisely the definition of μ\mu-strong convexity for ff.

Definition: Subgradient, Subdifferential, f(v)\partial f(\vv)

Let VV be a vector space over an ordered field FF, let DVD \subseteq V, and let f:DFf: D \to F. A subgradient of ff at a point vD\vv \in D is a linear functional gVg \in V^* having the property that

f(w)f(v)g(wv) f(\vw) - f(\vv) \ge g(\vw - \vv)

for all wD\vw \in D. The set of all subgradients of ff at a point vD\vv \in D, denoted by f(v)\partial f(\vv), is called the subdifferential of ff at v\vv.

f(v){gV:wD, f(w)f(v)g(wv)} \partial f(\vv) \coloneqq \{ g \in V^* : \forall \vw \in D,\ f(\vw) - f(\vv) \ge g(\vw - \vv) \}

Definition: Transpose, gT\vg^T, Representable Linear Functional, Representable Dual Space, VTV^T

Let VV be an inner product space over an ordered field FF, and let gV\vg \in V be a vector. The transpose of g\vg is the linear functional gT:VF\vg^T: V \to F defined by

gT(x)g,x. \vg^T(\vx) \coloneqq \langle \vg, \vx \rangle.

A linear functional g:VFg: V \to F is representable if there exists a vector gV\vg \in V such that g=gTg = \vg^T. The set of all representable linear functionals on VV, denoted by VTV^T, is called the representable dual space of VV.

VT{gT:gV}={gV:gV, xV, g(x)=g,x} V^T \coloneqq \{ \vg^T : \vg \in V \} = \{ g \in V^* : \exists \vg \in V,\ \forall \vx \in V,\ g(\vx) = \langle \vg, \vx \rangle \}

The representable dual space VTV^T is a linear subspace of the full dual space VV^*, and they coincide if and only if VV is finite-dimensional.

Subdifferentials are Convex Sets

Theorem: Let VV be a vector space over an ordered field FF, let DVD \subseteq V, and let f:DFf: D \to F. For any vD\vv \in D, the subdifferential f(v)\partial f(\vv) is a convex subset of the dual space VV^*.


Proof: Let g1,g2f(v)g_1, g_2 \in \partial f(\vv) be given. To show that f(v)\partial f(\vv) is convex, we must prove that the line segment [g1,g2][g_1, g_2] is contained in f(v)\partial f(\vv). By hypothesis, for all wD\vw \in D, we have

f(w)f(v)g1(wv) f(\vw) - f(\vv) \ge g_1(\vw - \vv)

and

f(w)f(v)g2(wv). f(\vw) - f(\vv) \ge g_2(\vw - \vv).

Let t[0F,1F]t \in [0_F, 1_F] be given. By multiplying the first equation by tt and the second equation by 1Ft1_F - t, we obtain

tf(w)tf(v)tg1(wv) t f(\vw) - t f(\vv) \ge t g_1(\vw - \vv)

and

(1Ft)f(w)(1Ft)f(v)(1Ft)g2(wv). (1_F - t) f(\vw) - (1_F - t) f(\vv) \ge (1_F - t) g_2(\vw - \vv).

By adding these two inequalities together, we obtain

f(w)f(v)[tg1+(1Ft)g2](wv) f(\vw) - f(\vv) \ge [t g_1 + (1_F - t) g_2](\vw - \vv)

which proves that tg1+(1Ft)g2=[g1,t,g2]f(v)t g_1 + (1_F - t) g_2 = [g_1, t, g_2] \in \partial f(\vv). Since t[0F,1F]t \in [0_F, 1_F] was arbitrary, this proves that the line segment [g1,g2][g_1, g_2] is contained in f(v)\partial f(\vv).

Subgradient of Squared Norm

Theorem: Let VV be an inner product space over an ordered field FF, and consider the function f:VFf: V \to F defined by f(x)1F2Fx,xf(\vx) \coloneqq \frac{1_F}{2_F} \langle \vx, \vx \rangle. At any point xV\vx \in V, the unique representable subgradient of ff at x\vx is xT\vx^T.

xV, f(x)VT={xT} \forall \vx \in V,\ \partial f(\vx) \cap V^T = \{ \vx^T \}

Proof: Let g,xV\vg, \vx \in V be given. By definition, gT\vg^T is a subdifferential of ff at x\vx if and only if

1F2Fy,y1F2Fx,xg,yx \frac{1_F}{2_F} \langle \vy, \vy \rangle - \frac{1_F}{2_F} \langle \vx, \vx \rangle \ge \langle \vg, \vy - \vx \rangle

for all yV\vy \in V. We can rearrange this inequality into the equivalent form

y2Fg,yx2Fg,x0F. \langle \vy - 2_F \vg, \vy \rangle - \langle \vx - 2_F \vg, \vx \rangle \ge 0_F.

Since this inequality must hold for all values of y\vy, in particular, it must hold when y=g\vy = \vg. Setting y\vy to g\vg simplifies the left-hand side of the inequality as follows:

g2Fg,gx2Fg,x=g,gx,x+2Fg,x=gx,gx \begin{aligned} \langle \vg - 2_F \vg, \vg \rangle - \langle \vx - 2_F \vg, \vx \rangle &= -\langle \vg, \vg \rangle - \langle \vx, \vx \rangle + 2_F \langle \vg, \vx \rangle \\ &= -\langle \vg - \vx, \vg - \vx \rangle \end{aligned}

It is impossible for the inequality gx,gx0-\langle \vg - \vx, \vg - \vx \rangle \ge 0 to hold whenever gx\vg \ne \vx. This proves that gTf(x)\vg^T \notin \partial f(\vx) whenever gx\vg \ne \vx. To prove that converse, that xTf(x)\vx^T \in \partial f(\vx), we must show that

1F2Fy,y1F2Fx,xx,yx \frac{1_F}{2_F} \langle \vy, \vy \rangle - \frac{1_F}{2_F} \langle \vx, \vx \rangle \ge \langle \vx, \vy - \vx \rangle

for all yV\vy \in V. We can simplify this inequality into the following equivalent forms:

1F2Fy,y1F2Fx,xx,yx,x1F2Fy,y+1F2Fx,xx,y1F2Fyx,yx0F \begin{aligned} \frac{1_F}{2_F} \langle \vy, \vy \rangle - \frac{1_F}{2_F} \langle \vx, \vx \rangle &\ge \langle \vx, \vy \rangle - \langle \vx, \vx \rangle \\ \frac{1_F}{2_F} \langle \vy, \vy \rangle + \frac{1_F}{2_F} \langle \vx, \vx \rangle &\ge \langle \vx, \vy \rangle \\ \frac{1_F}{2_F} \langle \vy - \vx, \vy - \vx \rangle &\ge 0_F \end{aligned}

It follows from the positive-definiteness of the inner product that this inequality holds for all yV\vy \in V.

We now introduce a notion that is dual to μ\mu-strong convexity.

Definition: LL-Controlled Function

Let VV be an inner product space over an ordered field FF, let CVC \subseteq V be convex, and let LFL \in F be positive (L>0FL > 0_F). A function f:CFf: C \to F is LL-controlled if, for all v,wC\vv, \vw \in C and t[0F,1F]t \in [0_F, 1_F], we have

f([v,t,w])[f(v),t,f(w)]L2Ft(1Ft)vw,vw. f([\vv, t, \vw]) \ge [f(\vv), t, f(\vw)] - \frac{L}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle.

The property of being LL-controlled becomes more restrictive as the value of LL decreases. For example, being 1F1_F-controlled is a stronger condition than being 2F2_F-controlled.

Characterizing LL-Control

Theorem: Let VV be an inner product space over an ordered field FF, let CVC \subseteq V be convex, and let LFL \in F be positive. A function f:CFf: C \to F is LL-controlled if and only if the function g:CFg: C \to F defined by

g(x)L2Fx,xf(x) g(\vx) \coloneqq \frac{L}{2_F} \langle \vx, \vx \rangle - f(\vx)

is convex.


Proof: By definition, gg is convex if and only if the following inequality holds:

g([v,t,w])[g(v),t,g(w)]L2F[v,t,w],[v,t,w]f([v,t,w])L2F[v,v,t,w,w][f(v),t,f(w)] \begin{aligned} g([\vv, t, \vw]) &\le [g(\vv), t, g(\vw)] \\ \frac{L}{2_F} \langle [\vv, t, \vw], [\vv, t, \vw] \rangle - f([\vv, t, \vw]) &\le \frac{L}{2_F} [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - [f(\vv), t, f(\vw)] \end{aligned}

After rearranging this inequality, we can simplify it using our formula for [v,v,t,w,w][v,t,w],[v,t,w][\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle:

f([v,t,w])[f(v),t,f(w)]L2F([v,v,t,w,w][v,t,w],[v,t,w])=[f(v),t,f(w)]L2Ft(1Ft)vw,vw \begin{aligned} f([\vv, t, \vw]) &\ge [f(\vv), t, f(\vw)] - \frac{L}{2_F} \big( [\langle \vv, \vv \rangle, t, \langle \vw, \vw \rangle] - \langle [\vv, t, \vw], [\vv, t, \vw] \rangle \big) \\ &= [f(\vv), t, f(\vw)] - \frac{L}{2_F} t (1_F - t) \langle \vv - \vw, \vv - \vw \rangle \end{aligned}

This is precisely the definition of ff being LL-controlled.

Let VV be an inner product space over an ordered field FF. Let LFL \in F be a positive constant, and let f:VFf: V \to F be an LL-controlled convex function. This means that both ff and the function γ:VF\gamma: V \to F defined by

γ(x)L2Fx,xf(x) \gamma(\vx) \coloneqq \frac{L}{2_F} \langle \vx, \vx \rangle - f(\vx)

are convex. Let xV\vx \in V be given, and suppose that both ff and γ\gamma have representable subgradients gTf(x)\vg^T \in \partial f(\vx) and hTγ(x)\vh^T \in \partial \gamma(\vx). It follows that gT+hT\vg^T + \vh^T is a representable subgradient of (f+γ)(x)=L2Fx,x(f + \gamma)(\vx) = \frac{L}{2_F} \langle \vx, \vx \rangle, which implies that gT+hT=LxT\vg^T + \vh^T = L\vx^T. By definition, for all yV\vy \in V, we have the inequality

[L2Fy,yf(y)][L2Fx,xf(x)]hT(yx) \left[ \frac{L}{2_F} \langle \vy, \vy \rangle - f(\vy) \right] - \left[ \frac{L}{2_F} \langle \vx, \vx \rangle - f(\vx) \right] \ge \vh^T(\vy - \vx)

which can be rearranged into the equivalent form

f(y)f(x)L2F[y,yx,x]hT(yx). f(\vy) - f(\vx) \le \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] - \vh^T(\vy - \vx).

Since gT+hT=LxT\vg^T + \vh^T = L\vx^T, we can write

f(y)f(x)L2F[y,yx,x]hT(yx)L2F[y,yx,x]hT(yx)+(gT+hT)(yx)LxT(yx)=L2F[y,yx,x]+gT(yx)Lx,yx=gT(yx)+L2Fyx,yx \begin{aligned} f(\vy) - f(\vx) &\le \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] - \vh^T(\vy - \vx) \\ &\le \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] - \vh^T(\vy - \vx) + \left( \vg^T + \vh^T \right) (\vy - \vx) - L\vx^T(\vy - \vx) \\ &= \frac{L}{2_F} \left[ \langle \vy, \vy \rangle - \langle \vx, \vx \rangle \right] + \vg^T (\vy - \vx) - L \langle \vx, \vy - \vx \rangle \\ &= \vg^T (\vy - \vx) + \frac{L}{2_F} \langle \vy - \vx, \vy - \vx \rangle \end{aligned}

which proves that

f(y)f(x)+gT(yx)+L2Fyx,yx f(\vy) \le f(\vx) + \vg^T (\vy - \vx) + \frac{L}{2_F} \langle \vy - \vx, \vy - \vx \rangle

for all yV\vy \in V. In particular, for any αF\alpha \in F, we can set y=xαg\vy = \vx - \alpha \vg to obtain the inequality

f(xαg)f(x)α(1FLα2F)g,g. f(\vx - \alpha \vg) \le f(\vx) - \alpha \left( 1_F - \frac{L \alpha}{2_F} \right) \langle \vg, \vg \rangle.

This inequality shows that a gradient descent step is guaranteed to decrease any convex objective function ff, as long as: