凸函数,Lipschitz smooth, strongly convex 您所在的位置:网站首页 利普希茨条件推出一致连续证明 凸函数,Lipschitz smooth, strongly convex

凸函数,Lipschitz smooth, strongly convex

2024-07-05 21:31| 来源: 网络整理| 查看: 265

本文是自己学习凸优化的笔记和总结。挂在这里主要是方便自己查。当然,如果能帮到手滑点进来的人也是极好的。

本节关于凸函数和与之相关的一些性质。

下一集传送: 梯度下降法(无约束 + 光滑 + 凸函数)

Convex function 定义

fff is convex if

dom fff is convex and (Jensen's inequality) αf(x)+βf(y)≥f(αx+βy)\alpha f(x) + \beta f(y) \geq f(\alpha x + \beta y)αf(x)+βf(y)≥f(αx+βy) for all x,yx, yx,y, for all α,β≥0\alpha, \beta \geq 0α,β≥0 s.t. α+β=1\alpha + \beta = 1α+β=1. 一阶等价条件

For differentiable fff, fff is convex   ⟺  \iff⟺

dom fff is convex and f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^T (y-x)f(y)≥f(x)+∇f(x)T(y−x) for all x,yx, yx,y.

一阶条件的几何意义是凸函数上任意点的切线都是凸函数的 under-estimator. 因为这个切线仿佛承接住了整个函数 fff, 所以又叫 supporting hyperplane.

二阶等价条件

For twice differentiable fff, fff is convex   ⟺  \iff⟺

dom fff is convex and ∇2f(x)⪰0\nabla^2 f(x) \succeq 0∇2f(x)⪰0 for all xxx.

二阶条件表明凸函数任意一点“曲度”都是凸的。也可以说梯度是单调的。

单调梯度等价条件

For differentiable fff, fff is convex   ⟺  \iff⟺

dom fff is convex and (∇f(x)−∇f(y))T(x−y)≥0\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq 0(∇f(x)−∇f(y))T(x−y)≥0.

证明

  ⟹  \implies⟹: 由一阶条件得

f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^T (y-x) f(y)≥f(x)+∇f(x)T(y−x)

f(x)≥f(y)+∇f(y)T(x−y)f(x) \geq f(y) + \nabla f(y)^T (x-y) f(x)≥f(y)+∇f(y)T(x−y)

相加两式得证.   ⟸  \impliedby⟸: 定义 g(t)=f(x+t(y−x))g(t)=f(x+t(y-x))g(t)=f(x+t(y−x)), 那么 g′(t)=∇f(x+t(y−x))T(y−x)g'(t)=\nabla f(x+t(y-x))^T (y-x)g′(t)=∇f(x+t(y−x))T(y−x). 那么对于 t≥0t \geq 0t≥0, 由于 (∇f(x)−∇f(y))T(x−y)≥0\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq 0(∇f(x)−∇f(y))T(x−y)≥0 所以 g′(t)−g′(0)≥0g'(t)-g'(0)\geq 0g′(t)−g′(0)≥0. 那么 g(1)−g(0)=∫01g′(t)dt≥∫01g′(0)dt=g′(0)g(1)-g(0) = \int_0^1 g'(t) dt \geq \int_0^1 g'(0) dt = g'(0)g(1)−g(0)=∫01​g′(t)dt≥∫01​g′(0)dt=g′(0) 即 f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^T (y-x)f(y)≥f(x)+∇f(x)T(y−x) for all x,yx, yx,y.

Lipschitz smooth 定义

fff is L-smooth if ∣∣∇f(x)−∇f(y)∣∣2≤L∣∣x−y∣∣2|| \nabla f(x) -\nabla f(y) || _2 \leq L || x-y || _2∣∣∇f(x)−∇f(y)∣∣2​≤L∣∣x−y∣∣2​ for all x,yx, yx,y.

L-smooth 表明一个函数的梯度的变化不会太突兀,或者说这个函数比较平滑。

等价条件 fff is convex and L-smooth. (∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22\big(\nabla f(x) -\nabla f(y)\big)^T(x-y) \leq L || x-y || _2^2(∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22​ for all x,yx, yx,y. g(x)=L2xTx−f(x)g(x)=\frac{L}{2} x^T x - f(x)g(x)=2L​xTx−f(x) is convex. (quadratic upper bound) 0≤f(y)−f(x)−∇f(x)T(y−x)≤L2∣∣x−y∣∣220 \leq f(y) - f(x) - \nabla f(x)^T (y-x) \leq \frac{L}{2} || x-y || _2^20≤f(y)−f(x)−∇f(x)T(y−x)≤2L​∣∣x−y∣∣22​ f(y)≥f(x)+∇f(x)T(y−x)+12L∣∣∇f(x)−∇f(y)∣∣22f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{1}{2L} || \nabla f(x) - \nabla f(y) || _2^2f(y)≥f(x)+∇f(x)T(y−x)+2L1​∣∣∇f(x)−∇f(y)∣∣22​ (co-coercivity) (∇f(x)−∇f(y))T(x−y)≥1L∣∣∇f(x)−∇f(y)∣∣22\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq \frac{1}{L} || \nabla f(x) - \nabla f(y) || _2^2(∇f(x)−∇f(y))T(x−y)≥L1​∣∣∇f(x)−∇f(y)∣∣22​.

证明

1  ⟹  21 \implies 21⟹2:

∣∣∇f(x)−∇f(y)∣∣2≤L∣∣x−y∣∣2∣∣∇f(x)−∇f(y)∣∣2⋅∣∣x−y∣∣2≤L∣∣x−y∣∣22(Cauchy-Schwartz)(∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22\begin{aligned} || \nabla f(x) -\nabla f(y) || _2 &\leq L || x-y || _2\\ ||\nabla f(x) -\nabla f(y) || _2 \cdot || x-y || _2 &\leq L || x-y || _2^2 \\ \text{(Cauchy-Schwartz)} \quad \big(\nabla f(x) -\nabla f(y)\big)^T(x-y) &\leq L || x-y || _2^2 \end{aligned} ∣∣∇f(x)−∇f(y)∣∣2​∣∣∇f(x)−∇f(y)∣∣2​⋅∣∣x−y∣∣2​(Cauchy-Schwartz)(∇f(x)−∇f(y))T(x−y)​≤L∣∣x−y∣∣2​≤L∣∣x−y∣∣22​≤L∣∣x−y∣∣22​​

2  ⟺  32 \iff 32⟺3:

(∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22=L(x−y)T(x−y)\big(\nabla f(x) -\nabla f(y)\big)^T(x-y) \leq L || x-y || _2^2 = L (x-y)^T (x-y) (∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22​=L(x−y)T(x−y)

(Lx−∇f(x)−Ly+∇f(y))T(x−y)≥0\big(Lx-\nabla f(x) - Ly+\nabla f(y) \big)^T(x-y) \geq 0 (Lx−∇f(x)−Ly+∇f(y))T(x−y)≥0

即 (∇g(x)−∇g(y))T(x−y)≥0\big (\nabla g(x) - \nabla g(y)\big )^T (x-y) \geq 0(∇g(x)−∇g(y))T(x−y)≥0, 由单调梯度条件, ggg 是 convex.

3  ⟺  43 \iff 43⟺4: 利用 ggg 为 convex 的一阶条件。

4  ⟹  54 \implies 54⟹5: 令 z=y+1L(∇f(x)−∇f(y))z=y+\frac{1}{L}(\nabla f(x)-\nabla f(y))z=y+L1​(∇f(x)−∇f(y)).

f(y)−f(x)=f(y)−f(z)+f(z)−f(x)(quadratic upper bound)≥−∇f(y)T(z−y)−L2∣∣y−z∣∣22+∇f(x)T(z−x)(substitute z)≥−1L∇f(y)T(∇f(x)−∇f(y))−12L∣∣∇f(x)−∇f(y)∣∣22+∇f(x)T(y−x)+1L∇f(x)T(∇f(x)−∇f(y))=∇f(x)T(y−x)+12L∣∣∇f(x)−∇f(y)∣∣22\begin{aligned} f(y)-f(x) &= f(y) - f(z) + f(z) - f(x) \\ \href{./# 等价条件}{\text{(quadratic upper bound)}} \quad &\geq -\nabla f(y)^T(z-y) - \frac{L}{2} ||y-z||_2^2 + \nabla f(x)^T (z-x) \\ \text{(substitute z)} \quad &\geq -\frac{1}{L}\nabla f(y)^T (\nabla f(x)-\nabla f(y)) \\ & \qquad - \frac{1}{2L} ||\nabla f(x)-\nabla f(y)||_2^2 + \nabla f(x)^T (y-x) \\ & \qquad + \frac{1}{L}\nabla f(x)^T (\nabla f(x)-\nabla f(y)) \\ &= \nabla f(x)^T (y-x) + \frac{1}{2L} ||\nabla f(x)-\nabla f(y)||_2^2 \end{aligned} f(y)−f(x)(quadratic upper bound)(substitute z)​=f(y)−f(z)+f(z)−f(x)≥−∇f(y)T(z−y)−2L​∣∣y−z∣∣22​+∇f(x)T(z−x)≥−L1​∇f(y)T(∇f(x)−∇f(y))−2L1​∣∣∇f(x)−∇f(y)∣∣22​+∇f(x)T(y−x)+L1​∇f(x)T(∇f(x)−∇f(y))=∇f(x)T(y−x)+2L1​∣∣∇f(x)−∇f(y)∣∣22​​

5  ⟹  65 \implies 65⟹6: 由 5,

f(y)≥f(x)+∇f(x)T(y−x)+12L∣∣∇f(x)−∇f(y)∣∣22f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{1}{2L} || \nabla f(x) - \nabla f(y) || _2^2 f(y)≥f(x)+∇f(x)T(y−x)+2L1​∣∣∇f(x)−∇f(y)∣∣22​

f(x)≥f(y)+∇f(y)T(x−y)+12L∣∣∇f(x)−∇f(y)∣∣22f(x) \geq f(y) + \nabla f(y)^T (x-y) + \frac{1}{2L} || \nabla f(x) - \nabla f(y) || _2^2 f(x)≥f(y)+∇f(y)T(x−y)+2L1​∣∣∇f(x)−∇f(y)∣∣22​

两式相加得 6.

6  ⟹  16 \implies 16⟹1: 对 6 的左边使用 Cauchy-Schwartz 不等式可得 L-smooth. 6 的右边 ≥0\geq 0≥0, 使用单调梯度可得 convex.

从几何上理解 L-smooth 函数,fff 的下界为超平面,上界为二次函数。如下图所示。

L-smooth 函数

Strongly convex 定义

fff is μ\muμ-strongly convex if f(x)−μ2xTxf(x)-\frac{\mu}{2}x^T xf(x)−2μ​xTx is convex.

一个函数减去二次函数仍然是 convex, 说明它至少有这个二次函数这么凸。或者说这个二次函数是它的一个下界。如下图所示。

Strong convex 函数

等价条件 f​f​f​ is differentiable and μ​\mu​μ​-strongly convex. fff is differentiable and f(αx+βy)≤αf(x)+βf(y)−μαβ2∣∣x−y∣∣22f(\alpha x +\beta y) \leq \alpha f(x) + \beta f(y) - \frac{\mu \alpha \beta}{2} ||x-y||_2^2f(αx+βy)≤αf(x)+βf(y)−2μαβ​∣∣x−y∣∣22​ for all x,yx, yx,y, for all α,β≥0\alpha, \beta \geq 0α,β≥0 s.t. α+β=1\alpha + \beta = 1α+β=1. (quadratic lower bound) f(y)≥f(x)+∇f(x)T(y−x)+μ2∣∣y−x∣∣22f(y)\geq f(x) + \nabla f(x)^T (y-x) + \frac{\mu}{2} ||y-x||_2^2f(y)≥f(x)+∇f(x)T(y−x)+2μ​∣∣y−x∣∣22​. (strong monotonicity or coercivity) (∇f(x)−∇f(y))T(x−y)≥μ∣∣x−y∣∣22\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq \mu|| x - y || _2^2(∇f(x)−∇f(y))T(x−y)≥μ∣∣x−y∣∣22​.

证明

$1 \iff 2 $: 使用 convex 函数定义. $1 \iff 3 $: 使用 convex 函数的一阶条件. $1 \iff 4 $: 使用 convex 函数的单调梯度条件.

References

Lecture notes: Gradient method. EE236C - Optimization Methods for Large-Scale Systems (Spring 2016). Vandenberghe, UCLA.

Zhou, X. (2018). On the Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient. arXiv preprint arXiv:1803.06573.



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有