凸函数,Lipschitz smooth, strongly convex | 您所在的位置:网站首页 › 利普希茨条件推出一致连续证明 › 凸函数,Lipschitz smooth, strongly convex |
本文是自己学习凸优化的笔记和总结。挂在这里主要是方便自己查。当然,如果能帮到手滑点进来的人也是极好的。 本节关于凸函数和与之相关的一些性质。 下一集传送: 梯度下降法(无约束 + 光滑 + 凸函数) Convex function 定义fff is convex if dom fff is convex and (Jensen's inequality) αf(x)+βf(y)≥f(αx+βy)\alpha f(x) + \beta f(y) \geq f(\alpha x + \beta y)αf(x)+βf(y)≥f(αx+βy) for all x,yx, yx,y, for all α,β≥0\alpha, \beta \geq 0α,β≥0 s.t. α+β=1\alpha + \beta = 1α+β=1. 一阶等价条件For differentiable fff, fff is convex ⟺ \iff⟺ dom fff is convex and f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^T (y-x)f(y)≥f(x)+∇f(x)T(y−x) for all x,yx, yx,y.一阶条件的几何意义是凸函数上任意点的切线都是凸函数的 under-estimator. 因为这个切线仿佛承接住了整个函数 fff, 所以又叫 supporting hyperplane. 二阶等价条件For twice differentiable fff, fff is convex ⟺ \iff⟺ dom fff is convex and ∇2f(x)⪰0\nabla^2 f(x) \succeq 0∇2f(x)⪰0 for all xxx.二阶条件表明凸函数任意一点“曲度”都是凸的。也可以说梯度是单调的。 单调梯度等价条件For differentiable fff, fff is convex ⟺ \iff⟺ dom fff is convex and (∇f(x)−∇f(y))T(x−y)≥0\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq 0(∇f(x)−∇f(y))T(x−y)≥0.证明 ⟹ \implies⟹: 由一阶条件得 f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^T (y-x) f(y)≥f(x)+∇f(x)T(y−x) f(x)≥f(y)+∇f(y)T(x−y)f(x) \geq f(y) + \nabla f(y)^T (x-y) f(x)≥f(y)+∇f(y)T(x−y) 相加两式得证. ⟸ \impliedby⟸: 定义 g(t)=f(x+t(y−x))g(t)=f(x+t(y-x))g(t)=f(x+t(y−x)), 那么 g′(t)=∇f(x+t(y−x))T(y−x)g'(t)=\nabla f(x+t(y-x))^T (y-x)g′(t)=∇f(x+t(y−x))T(y−x). 那么对于 t≥0t \geq 0t≥0, 由于 (∇f(x)−∇f(y))T(x−y)≥0\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq 0(∇f(x)−∇f(y))T(x−y)≥0 所以 g′(t)−g′(0)≥0g'(t)-g'(0)\geq 0g′(t)−g′(0)≥0. 那么 g(1)−g(0)=∫01g′(t)dt≥∫01g′(0)dt=g′(0)g(1)-g(0) = \int_0^1 g'(t) dt \geq \int_0^1 g'(0) dt = g'(0)g(1)−g(0)=∫01g′(t)dt≥∫01g′(0)dt=g′(0) 即 f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^T (y-x)f(y)≥f(x)+∇f(x)T(y−x) for all x,yx, yx,y. Lipschitz smooth 定义fff is L-smooth if ∣∣∇f(x)−∇f(y)∣∣2≤L∣∣x−y∣∣2|| \nabla f(x) -\nabla f(y) || _2 \leq L || x-y || _2∣∣∇f(x)−∇f(y)∣∣2≤L∣∣x−y∣∣2 for all x,yx, yx,y. L-smooth 表明一个函数的梯度的变化不会太突兀,或者说这个函数比较平滑。 等价条件 fff is convex and L-smooth. (∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22\big(\nabla f(x) -\nabla f(y)\big)^T(x-y) \leq L || x-y || _2^2(∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22 for all x,yx, yx,y. g(x)=L2xTx−f(x)g(x)=\frac{L}{2} x^T x - f(x)g(x)=2LxTx−f(x) is convex. (quadratic upper bound) 0≤f(y)−f(x)−∇f(x)T(y−x)≤L2∣∣x−y∣∣220 \leq f(y) - f(x) - \nabla f(x)^T (y-x) \leq \frac{L}{2} || x-y || _2^20≤f(y)−f(x)−∇f(x)T(y−x)≤2L∣∣x−y∣∣22 f(y)≥f(x)+∇f(x)T(y−x)+12L∣∣∇f(x)−∇f(y)∣∣22f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{1}{2L} || \nabla f(x) - \nabla f(y) || _2^2f(y)≥f(x)+∇f(x)T(y−x)+2L1∣∣∇f(x)−∇f(y)∣∣22 (co-coercivity) (∇f(x)−∇f(y))T(x−y)≥1L∣∣∇f(x)−∇f(y)∣∣22\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq \frac{1}{L} || \nabla f(x) - \nabla f(y) || _2^2(∇f(x)−∇f(y))T(x−y)≥L1∣∣∇f(x)−∇f(y)∣∣22.证明 1 ⟹ 21 \implies 21⟹2: ∣∣∇f(x)−∇f(y)∣∣2≤L∣∣x−y∣∣2∣∣∇f(x)−∇f(y)∣∣2⋅∣∣x−y∣∣2≤L∣∣x−y∣∣22(Cauchy-Schwartz)(∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22\begin{aligned} || \nabla f(x) -\nabla f(y) || _2 &\leq L || x-y || _2\\ ||\nabla f(x) -\nabla f(y) || _2 \cdot || x-y || _2 &\leq L || x-y || _2^2 \\ \text{(Cauchy-Schwartz)} \quad \big(\nabla f(x) -\nabla f(y)\big)^T(x-y) &\leq L || x-y || _2^2 \end{aligned} ∣∣∇f(x)−∇f(y)∣∣2∣∣∇f(x)−∇f(y)∣∣2⋅∣∣x−y∣∣2(Cauchy-Schwartz)(∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣2≤L∣∣x−y∣∣22≤L∣∣x−y∣∣22 2 ⟺ 32 \iff 32⟺3: (∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22=L(x−y)T(x−y)\big(\nabla f(x) -\nabla f(y)\big)^T(x-y) \leq L || x-y || _2^2 = L (x-y)^T (x-y) (∇f(x)−∇f(y))T(x−y)≤L∣∣x−y∣∣22=L(x−y)T(x−y) (Lx−∇f(x)−Ly+∇f(y))T(x−y)≥0\big(Lx-\nabla f(x) - Ly+\nabla f(y) \big)^T(x-y) \geq 0 (Lx−∇f(x)−Ly+∇f(y))T(x−y)≥0 即 (∇g(x)−∇g(y))T(x−y)≥0\big (\nabla g(x) - \nabla g(y)\big )^T (x-y) \geq 0(∇g(x)−∇g(y))T(x−y)≥0, 由单调梯度条件, ggg 是 convex. 3 ⟺ 43 \iff 43⟺4: 利用 ggg 为 convex 的一阶条件。 4 ⟹ 54 \implies 54⟹5: 令 z=y+1L(∇f(x)−∇f(y))z=y+\frac{1}{L}(\nabla f(x)-\nabla f(y))z=y+L1(∇f(x)−∇f(y)). f(y)−f(x)=f(y)−f(z)+f(z)−f(x)(quadratic upper bound)≥−∇f(y)T(z−y)−L2∣∣y−z∣∣22+∇f(x)T(z−x)(substitute z)≥−1L∇f(y)T(∇f(x)−∇f(y))−12L∣∣∇f(x)−∇f(y)∣∣22+∇f(x)T(y−x)+1L∇f(x)T(∇f(x)−∇f(y))=∇f(x)T(y−x)+12L∣∣∇f(x)−∇f(y)∣∣22\begin{aligned} f(y)-f(x) &= f(y) - f(z) + f(z) - f(x) \\ \href{./# 等价条件}{\text{(quadratic upper bound)}} \quad &\geq -\nabla f(y)^T(z-y) - \frac{L}{2} ||y-z||_2^2 + \nabla f(x)^T (z-x) \\ \text{(substitute z)} \quad &\geq -\frac{1}{L}\nabla f(y)^T (\nabla f(x)-\nabla f(y)) \\ & \qquad - \frac{1}{2L} ||\nabla f(x)-\nabla f(y)||_2^2 + \nabla f(x)^T (y-x) \\ & \qquad + \frac{1}{L}\nabla f(x)^T (\nabla f(x)-\nabla f(y)) \\ &= \nabla f(x)^T (y-x) + \frac{1}{2L} ||\nabla f(x)-\nabla f(y)||_2^2 \end{aligned} f(y)−f(x)(quadratic upper bound)(substitute z)=f(y)−f(z)+f(z)−f(x)≥−∇f(y)T(z−y)−2L∣∣y−z∣∣22+∇f(x)T(z−x)≥−L1∇f(y)T(∇f(x)−∇f(y))−2L1∣∣∇f(x)−∇f(y)∣∣22+∇f(x)T(y−x)+L1∇f(x)T(∇f(x)−∇f(y))=∇f(x)T(y−x)+2L1∣∣∇f(x)−∇f(y)∣∣22 5 ⟹ 65 \implies 65⟹6: 由 5, f(y)≥f(x)+∇f(x)T(y−x)+12L∣∣∇f(x)−∇f(y)∣∣22f(y) \geq f(x) + \nabla f(x)^T (y-x) + \frac{1}{2L} || \nabla f(x) - \nabla f(y) || _2^2 f(y)≥f(x)+∇f(x)T(y−x)+2L1∣∣∇f(x)−∇f(y)∣∣22 f(x)≥f(y)+∇f(y)T(x−y)+12L∣∣∇f(x)−∇f(y)∣∣22f(x) \geq f(y) + \nabla f(y)^T (x-y) + \frac{1}{2L} || \nabla f(x) - \nabla f(y) || _2^2 f(x)≥f(y)+∇f(y)T(x−y)+2L1∣∣∇f(x)−∇f(y)∣∣22 两式相加得 6. 6 ⟹ 16 \implies 16⟹1: 对 6 的左边使用 Cauchy-Schwartz 不等式可得 L-smooth. 6 的右边 ≥0\geq 0≥0, 使用单调梯度可得 convex. 从几何上理解 L-smooth 函数,fff 的下界为超平面,上界为二次函数。如下图所示。 L-smooth 函数 Strongly convex 定义fff is μ\muμ-strongly convex if f(x)−μ2xTxf(x)-\frac{\mu}{2}x^T xf(x)−2μxTx is convex. 一个函数减去二次函数仍然是 convex, 说明它至少有这个二次函数这么凸。或者说这个二次函数是它的一个下界。如下图所示。 Strong convex 函数 等价条件 fff is differentiable and μ\muμ-strongly convex. fff is differentiable and f(αx+βy)≤αf(x)+βf(y)−μαβ2∣∣x−y∣∣22f(\alpha x +\beta y) \leq \alpha f(x) + \beta f(y) - \frac{\mu \alpha \beta}{2} ||x-y||_2^2f(αx+βy)≤αf(x)+βf(y)−2μαβ∣∣x−y∣∣22 for all x,yx, yx,y, for all α,β≥0\alpha, \beta \geq 0α,β≥0 s.t. α+β=1\alpha + \beta = 1α+β=1. (quadratic lower bound) f(y)≥f(x)+∇f(x)T(y−x)+μ2∣∣y−x∣∣22f(y)\geq f(x) + \nabla f(x)^T (y-x) + \frac{\mu}{2} ||y-x||_2^2f(y)≥f(x)+∇f(x)T(y−x)+2μ∣∣y−x∣∣22. (strong monotonicity or coercivity) (∇f(x)−∇f(y))T(x−y)≥μ∣∣x−y∣∣22\big (\nabla f(x) - \nabla f(y)\big )^T (x-y) \geq \mu|| x - y || _2^2(∇f(x)−∇f(y))T(x−y)≥μ∣∣x−y∣∣22.证明 $1 \iff 2 $: 使用 convex 函数定义. $1 \iff 3 $: 使用 convex 函数的一阶条件. $1 \iff 4 $: 使用 convex 函数的单调梯度条件. ReferencesLecture notes: Gradient method. EE236C - Optimization Methods for Large-Scale Systems (Spring 2016). Vandenberghe, UCLA. Zhou, X. (2018). On the Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient. arXiv preprint arXiv:1803.06573. |
CopyRight 2018-2019 实验室设备网 版权所有 |