Notes on Convex Analysis
David K. Zhang
Last Modified 2023-12-31
In these notes, we will develop a generalized form of convex analysis for vector spaces with an arbitrary (possibly infinite) number of dimensions over any ordered field (not just Q or R).
Let F be an ordered field. The unit interval in F, denoted by [0F,1F], is the set
[0F,1F]:={t∈F:0F≤t≤1F}where 0F and 1F denote the additive and multiplicative identity elements of F, respectively. Similarly, the open unit interval in F, denoted by (0F,1F), is the set
(0F,1F):={t∈F:0F<t<1F}.
Whenever we work in an unspecified ordered field F, we will refer to the additive and multiplicative identity elements of F as 0F and 1F, respectively, to remind ourselves that these may not be real numbers.
Let V be a vector space over an ordered field F. Given two vectors v,w∈V, the line segment connecting v to w is the set
[v,w]:={tv+(1F−t)w:t∈[0F,1F]}.We also define the notation
[v,t,w]:=tv+(1F−t)wto refer to a particular point along this line segment.
Although the notation [v,t,w] is well-defined for all values of t∈F, we will only write [v,t,w] when 0F≤t≤1F.
Let V be a vector space over an ordered field F. A subset C⊆V is convex if, for all pairs of points v,w∈C, the line segment [v,w] is contained in C.
It immediately follows from this definition that:
- Any singleton subset of V is a convex set.
- Any linear subspace of V, including V itself, is a convex set.
- The intersection of any number of convex subsets of V is a convex set.
Let V be a vector space over an ordered field F, and let C⊆V be convex. A function f:C→F is convex if, for all v,w∈C and t∈[0F,1F], we have
f([v,t,w])≤[f(v),t,f(w)].
On the right-hand side of this inequality, we regard F itself as a vector space over F in order to write [f(v),t,f(w)]. Note that the domain of a convex function is required to be a convex set in order for f([v,t,w]) to be well-defined.
Let V be a vector space over an ordered field F, and let C⊆V be convex. A function f:C→F is strictly convex if, for all v,w∈C and t∈(0F,1F), we have
f([v,t,w])<[f(v),t,f(w)].
Let V be an inner product space over an ordered field F, let C⊆V be convex, and let μ∈F be positive (μ>0F). A function f:C→F is μ-strongly convex if, for all v,w∈C and t∈[0F,1F], we have
f([v,t,w])≤[f(v),t,f(w)]−2Fμt(1F−t)⟨v−w,v−w⟩.
Note that division by 2F is well-defined because an ordered field cannot have positive characteristic.
Theorem: Let V be a vector space over a field F, and let ⟨⋅,⋅⟩:V×V→F be a bilinear form on V. (We do not assume ⟨⋅,⋅⟩ to be symmetric or positive-definite.) For any v,w∈V and t∈F, we have
[⟨v,v⟩,t,⟨w,w⟩]−⟨[v,t,w],[v,t,w]⟩=t(1F−t)⟨v−w,v−w⟩.If, in addition, F is an ordered field, ⟨⋅,⋅⟩ is positive-semidefinite, and 0F≤t≤1F, then
[⟨v,v⟩,t,⟨w,w⟩]≥⟨[v,t,w],[v,t,w]⟩.
Proof: Let v,w∈V and t∈F be given. We proceed by direct computation:
[⟨v,v⟩,t,⟨w,w⟩]−⟨[v,t,w],[v,t,w]⟩=t⟨v,v⟩+(1F−t)⟨w,w⟩−⟨tv+(1F−t)w,tv+(1F−t)w⟩=t⟨v,v⟩+(1F−t)⟨w,w⟩−t2⟨v,v⟩−t(1F−t)⟨v,w⟩−t(1F−t)⟨w,v⟩−(1F−t)2⟨w,w⟩=t(1F−t)⟨v,v⟩+t(1F−t)⟨w,w⟩−t(1F−t)⟨v,w⟩−t(1F−t)⟨w,v⟩=t(1F−t)[⟨v,v⟩−⟨v,w⟩−⟨w,v⟩+⟨w,w⟩]=t(1F−t)⟨v−w,v−w⟩This establishes the desired equation. Now observe that if ⟨⋅,⋅⟩ is positive-semidefinite and 0F≤t≤1F, then the right-hand side t(1F−t)⟨v−w,v−w⟩ is a product of three nonnegative quantities. This establishes the desired inequality. ∎
As a direct corollary of this result, it follows that the squared-norm function x↦⟨x,x⟩ is 2F-strongly convex, or equivalently, that the scaled function x↦2F1F⟨x,x⟩ is 1F-strongly convex. It turns out that the squared-norm function completely characterizes the notion of μ-strong convexity, in the sense that the μ-strong convexity of any function can be determined by comparison to x↦2Fμ⟨x,x⟩.
Theorem: Let V be an inner product space over an ordered field F, let C⊆V be convex, and let μ∈F be positive. A function f:C→F is μ-strongly convex if and only if the function g:C→F defined by
g(x):=f(x)−2Fμ⟨x,x⟩is convex.
Proof: By definition, g is convex if and only if the following inequality holds:
g([v,t,w])f([v,t,w])−2Fμ⟨[v,t,w],[v,t,w]⟩≤[g(v),t,g(w)]≤[f(v),t,f(w)]−2Fμ[⟨v,v⟩,t,⟨w,w⟩]After rearranging this inequality, we can simplify it using our formula for [⟨v,v⟩,t,⟨w,w⟩]−⟨[v,t,w],[v,t,w]⟩:
f([v,t,w])≤[f(v),t,f(w)]−2Fμ([⟨v,v⟩,t,⟨w,w⟩]−⟨[v,t,w],[v,t,w]⟩)=[f(v),t,f(w)]−2Fμt(1F−t)⟨v−w,v−w⟩This is precisely the definition of μ-strong convexity for f. ∎
Let V be a vector space over an ordered field F, let D⊆V, and let f:D→F. A subgradient of f at a point v∈D is a linear functional g∈V∗ having the property that
f(w)−f(v)≥g(w−v)for all w∈D. The set of all subgradients of f at a point v∈D, denoted by ∂f(v), is called the subdifferential of f at v.
∂f(v):={g∈V∗:∀w∈D, f(w)−f(v)≥g(w−v)}
Let V be an inner product space over an ordered field F, and let g∈V be a vector. The transpose of g is the linear functional gT:V→F defined by
gT(x):=⟨g,x⟩.A linear functional g:V→F is representable if there exists a vector g∈V such that g=gT. The set of all representable linear functionals on V, denoted by VT, is called the representable dual space of V.
VT:={gT:g∈V}={g∈V∗:∃g∈V, ∀x∈V, g(x)=⟨g,x⟩}
The representable dual space VT is a linear subspace of the full dual space V∗, and they coincide if and only if V is finite-dimensional.
Theorem: Let V be a vector space over an ordered field F, let D⊆V, and let f:D→F. For any v∈D, the subdifferential ∂f(v) is a convex subset of the dual space V∗.
Proof: Let g1,g2∈∂f(v) be given. To show that ∂f(v) is convex, we must prove that the line segment [g1,g2] is contained in ∂f(v). By hypothesis, for all w∈D, we have
f(w)−f(v)≥g1(w−v)and
f(w)−f(v)≥g2(w−v).Let t∈[0F,1F] be given. By multiplying the first equation by t and the second equation by 1F−t, we obtain
tf(w)−tf(v)≥tg1(w−v)and
(1F−t)f(w)−(1F−t)f(v)≥(1F−t)g2(w−v).By adding these two inequalities together, we obtain
f(w)−f(v)≥[tg1+(1F−t)g2](w−v)which proves that tg1+(1F−t)g2=[g1,t,g2]∈∂f(v). Since t∈[0F,1F] was arbitrary, this proves that the line segment [g1,g2] is contained in ∂f(v). ∎
Theorem: Let V be an inner product space over an ordered field F, and consider the function f:V→F defined by f(x):=2F1F⟨x,x⟩. At any point x∈V, the unique representable subgradient of f at x is xT.
∀x∈V, ∂f(x)∩VT={xT}
Proof: Let g,x∈V be given. By definition, gT is a subdifferential of f at x if and only if
2F1F⟨y,y⟩−2F1F⟨x,x⟩≥⟨g,y−x⟩for all y∈V. We can rearrange this inequality into the equivalent form
⟨y−2Fg,y⟩−⟨x−2Fg,x⟩≥0F.Since this inequality must hold for all values of y, in particular, it must hold when y=g. Setting y to g simplifies the left-hand side of the inequality as follows:
⟨g−2Fg,g⟩−⟨x−2Fg,x⟩=−⟨g,g⟩−⟨x,x⟩+2F⟨g,x⟩=−⟨g−x,g−x⟩It is impossible for the inequality −⟨g−x,g−x⟩≥0 to hold whenever g=x. This proves that gT∈/∂f(x) whenever g=x. To prove that converse, that xT∈∂f(x), we must show that
2F1F⟨y,y⟩−2F1F⟨x,x⟩≥⟨x,y−x⟩for all y∈V. We can simplify this inequality into the following equivalent forms:
2F1F⟨y,y⟩−2F1F⟨x,x⟩2F1F⟨y,y⟩+2F1F⟨x,x⟩2F1F⟨y−x,y−x⟩≥⟨x,y⟩−⟨x,x⟩≥⟨x,y⟩≥0FIt follows from the positive-definiteness of the inner product that this inequality holds for all y∈V. ∎
We now introduce a notion that is dual to μ-strong convexity.
Let V be an inner product space over an ordered field F, let C⊆V be convex, and let L∈F be positive (L>0F). A function f:C→F is L-controlled if, for all v,w∈C and t∈[0F,1F], we have
f([v,t,w])≥[f(v),t,f(w)]−2FLt(1F−t)⟨v−w,v−w⟩.
The property of being L-controlled becomes more restrictive as the value of L decreases. For example, being 1F-controlled is a stronger condition than being 2F-controlled.
Theorem: Let V be an inner product space over an ordered field F, let C⊆V be convex, and let L∈F be positive. A function f:C→F is L-controlled if and only if the function g:C→F defined by
g(x):=2FL⟨x,x⟩−f(x)is convex.
Proof: By definition, g is convex if and only if the following inequality holds:
g([v,t,w])2FL⟨[v,t,w],[v,t,w]⟩−f([v,t,w])≤[g(v),t,g(w)]≤2FL[⟨v,v⟩,t,⟨w,w⟩]−[f(v),t,f(w)]After rearranging this inequality, we can simplify it using our formula for [⟨v,v⟩,t,⟨w,w⟩]−⟨[v,t,w],[v,t,w]⟩:
f([v,t,w])≥[f(v),t,f(w)]−2FL([⟨v,v⟩,t,⟨w,w⟩]−⟨[v,t,w],[v,t,w]⟩)=[f(v),t,f(w)]−2FLt(1F−t)⟨v−w,v−w⟩This is precisely the definition of f being L-controlled. ∎
Let V be an inner product space over an ordered field F. Let L∈F be a positive constant, and let f:V→F be an L-controlled convex function. This means that both f and the function γ:V→F defined by
γ(x):=2FL⟨x,x⟩−f(x)are convex. Let x∈V be given, and suppose that both f and γ have representable subgradients gT∈∂f(x) and hT∈∂γ(x). It follows that gT+hT is a representable subgradient of (f+γ)(x)=2FL⟨x,x⟩, which implies that gT+hT=LxT. By definition, for all y∈V, we have the inequality
[2FL⟨y,y⟩−f(y)]−[2FL⟨x,x⟩−f(x)]≥hT(y−x)which can be rearranged into the equivalent form
f(y)−f(x)≤2FL[⟨y,y⟩−⟨x,x⟩]−hT(y−x).Since gT+hT=LxT, we can write
f(y)−f(x)≤2FL[⟨y,y⟩−⟨x,x⟩]−hT(y−x)≤2FL[⟨y,y⟩−⟨x,x⟩]−hT(y−x)+(gT+hT)(y−x)−LxT(y−x)=2FL[⟨y,y⟩−⟨x,x⟩]+gT(y−x)−L⟨x,y−x⟩=gT(y−x)+2FL⟨y−x,y−x⟩which proves that
f(y)≤f(x)+gT(y−x)+2FL⟨y−x,y−x⟩for all y∈V. In particular, for any α∈F, we can set y=x−αg to obtain the inequality
f(x−αg)≤f(x)−α(1F−2FLα)⟨g,g⟩.This inequality shows that a gradient descent step is guaranteed to decrease any convex objective function f, as long as:
- f is L-controlled for some positive constant L∈F;
- the step size α is strictly between 0F and 2F/L;
- both f and γ have representable subgradients at the current point x;
- the subgradient gT∈∂f(x) used to take the step is not zero.