线性回归（Linear Regression）

2024-07-17 14:15| 来源: 网络整理| 查看: 265

看了一下斯坦福大学公开课：机器学习教程（吴恩达教授），记录了一些笔记，写出来以便以后有用到。笔记如有误，还望告知。本系列其它笔记：线性回归（Linear Regression）分类和逻辑回归（Classification and logistic regression）广义线性模型（Generalized Linear Models）生成学习算法（Generative Learning algorithms）

线性回归（Linear Regression） 1 基础知识特征（feature）

x → 样本输入特征 , y → 输出 ; ( x , y ) → 一个训练样本 ; ( x ( i ) , y ( i ) ) → 第 i 行训练样本 ; { ( x ( i ) , y ( i ) ) ; i = 1 , ⋯ , m } → 训练集 x\rightarrow 样本输入特征,y \rightarrow 输出; \\(x,y)\rightarrow 一个训练样本;\\ (x^{(i)},y^{(i)})\rightarrow 第i行训练样本;\\ \lbrace (x^{(i)},y^{(i)});i=1,\cdots,m \rbrace \rightarrow 训练集 x→样本输入特征,y→输出;(x,y)→一个训练样本;(x(i),y(i))→第i行训练样本;{ (x(i),y(i));i=1,⋯,m}→训练集

假定函数（hypothesis function）

监督学习中，为了拟合给定训练集，我们使用 h θ h_{\theta} hθ线性回归表示： h ( θ ) ( x ) = θ 0 + θ 1 x h_{(\theta)}(x) = \theta_{0} + \theta_{1}x h(θ)(x)=θ0+θ1x 在这里插入图片描述

假定训练样本x有2个输入特征，如 ( x 1 ( i ) , x 2 ( i ) y ( i ) ) (x_{1}^{(i)},x_{2}^{(i)}y^{(i)}) (x1(i),x2(i)y(i)), h ( θ ) ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 h_{(\theta)}(x) = \theta_{0} + \theta_{1}x_{1}+\theta_{2}x_{2} h(θ)(x)=θ0+θ1x1+θ2x2

假定训练样本x有多个输入特征，如 ( x 1 ( i ) , x 2 ( i ) , ⋯ , x n ( i ) , y ( i ) ) (x_{1}^{(i)},x_{2}^{(i)},\cdots,x_{n}^{(i)},y^{(i)}) (x1(i),x2(i),⋯,xn(i),y(i))，为了渐变公式，我们约定 x 0 = 1 x_0=1 x0=1。 h ( θ ) ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n = ∑ i = 0 n θ i x i = θ T x h_{(\theta)}(x) = \theta_{0} + \theta_{1}x_{1}+\theta_{2}x_{2}+\cdots+\theta_{n}x_{n} = \sum_{i=0}^{n}\theta_{i}x_{i} = \theta^{T}x h(θ)(x)=θ0+θ1x1+θ2x2+⋯+θnxn=i=0∑nθixi=θTx

损失函数（cost function）

给定一个训练集，我们如何采集，学习参数 θ \theta θ?使得h(x)最接近y，至少接近我们输入的训练集。 J ( θ ) = 1 2 ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 1 2 仅仅为了之后求导计算方便而加 ; x ( i ) → 第 i 个样本输入特征 ; y ( i ) → 第 i 个样本对应输出 ; h θ ( x ( i ) ) → 假定函数 ; J(\theta) = \frac{1}{2}\sum_{i=1}^{m}(h_{\theta}(x^{(i)}) \ - \ y^{(i)})^2 \\\frac{1}{2}仅仅为了之后求导计算方便而加; \\x^{(i)}\rightarrow第i个样本输入特征; \\y^{(i)}\rightarrow第i个样本对应输出; \\h_{\theta}(x^{(i)})\rightarrow假定函数; J(θ)=21i=1∑m(hθ(x(i)) − y(i))221仅仅为了之后求导计算方便而加;x(i)→第i个样本输入特征;y(i)→第i个样本对应输出;hθ(x(i))→假定函数;

求出一个 θ \theta θ，使得 J ( θ ) 最小 J(\theta)最小 J(θ)最小。

2 LMS algorithm 梯度下降（Gradient Descent）

θ j : = θ j − α ∂ ∂ θ j J ( θ ) . ( j = 0 , … , n ; α 称为学习率 ) \theta_j \ := \theta_j - \alpha \ \left.\frac{\partial}{\partial\theta_j}\right.J(\theta).(j = 0,\dots,n;\alpha 称为学习率) θj :=θj−α ∂θj∂J(θ).(j=0,…,n;α称为学习率)

假定只有一个样本时， ∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 ( h θ ( x ) − y ) 2 = 2 ∗ 1 2 ∗ ( h θ ( x ) − y ) ∗ ∂ ∂ θ j ( h θ ( x ) − y ) = ( h θ ( x ) − y ) ∗ ∂ ∂ θ j ( ∑ i = 0 n θ i x i − y ) = ( h θ ( x ) − y ) x j \left.\frac{\partial}{\partial\theta_j}\right.J(\theta) \\= \left.\frac{\partial}{\partial\theta_j}\right.\frac{1}{2}(h_{\theta}(x) - y)^{2} \\=2*\frac{1}{2}*(h_{\theta}(x) - y)*\left.\frac{\partial}{\partial\theta_j}\right.(h_{\theta}(x) - y) \\=(h_{\theta}(x) - y)*\left.\frac{\partial}{\partial\theta_j}\right.(\sum_{i=0}^{n}\theta_{i}x_{i} - y) \\=(h_{\theta}(x) - y)x_{j} ∂θj∂J(θ)=∂θj∂21(hθ(x)−y)2=2∗21∗(hθ(x)−y)∗∂θj∂(hθ(x)−y)=(hθ(x)−y)∗∂θj∂(∑i=0nθixi−y)=(hθ(x)−y)xj

所以， θ j : = θ j − α ( h θ ( x ) − y ) x j \theta_j \ := \theta_j - \alpha(h_{\theta}(x) - y)x_{j} θj :=θj−α(hθ(x)−y)xj

当有m个样本时， ∂ ∂ θ j J ( θ ) = ∂ ∂ θ j 1 2 ∑ i = 1 m ( h θ ( x ) − y ) 2 = 1 2 ∂ ∂ θ j { ( h θ ( x ( 1 ) ) − y ( 1 ) ) 2 + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) 2 } = 2 ∗ 1 2 ∗ ( h θ ( x ( 1 ) ) − y ( 1 ) ) ∗ ∂ ∂ θ j ( h θ ( x ( 1 ) ) − y ( 1 ) ) + ⋯ + 2 ∗ 1 2 ∗ ( h θ ( x ( m ) ) − y ( m ) ) ∗ ∂ ∂ θ j ( h θ ( x ( m ) ) − y ( m ) ) = ( h θ ( x ( 1 ) ) − y ( 1 ) ) ∗ ∂ ∂ θ j ( ∑ i = 0 n θ i x i ( 1 ) − y ( 1 ) ) + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) ∗ ∂ ∂ θ j ( ∑ i = 0 n θ i x i ( m ) − y ( m ) ) = ( h θ ( x ( 1 ) ) − y ( 1 ) ) x j + ⋯ + ( h θ ( x ( m ) ) − y ( m ) ) x j = ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \left.\frac{\partial}{\partial\theta_j}\right.J(\theta) \\ = \left.\frac{\partial}{\partial\theta_j}\right.\frac{1}{2}\sum_{i=1}^{m}(h_{\theta}(x) - y)^{2} \\ =\frac{1}{2}\left.\frac{\partial}{\partial\theta_j}\right.\lbrace (h_{\theta}(x^{(1)}) - y^{(1)})^2 + \dots + (h_{\theta}(x^{(m)}) - y^{(m)})^2 \rbrace \\ =2*\frac{1}{2}*(h_{\theta}(x^{(1)}) - y^{(1)})*\left.\frac{\partial}{\partial\theta_j}\right.(h_{\theta}(x^{(1)}) - y^{(1)}) + \cdots +2*\frac{1}{2}*(h_{\theta}(x^{(m)}) - y^{(m)})*\left.\frac{\partial}{\partial\theta_j}\right.(h_{\theta}(x^{(m)}) - y^{(m)}) \\ =(h_{\theta}(x^{(1)}) - y^{(1)})*\left.\frac{\partial}{\partial\theta_j}\right.(\sum_{i=0}^{n}\theta_{i}x_{i}^{(1)} - y^{(1)}) + \cdots + (h_{\theta}(x^{(m)}) - y^{(m)})*\left.\frac{\partial}{\partial\theta_j}\right.(\sum_{i=0}^{n}\theta_{i}x_{i}^{(m)} - y^{(m)}) \\ =(h_{\theta}(x^{(1)}) - y^{(1)})x_{j} + \cdots + (h_{\theta}(x^{(m)}) - y^{(m)})x_{j} \\ =\sum_{i=1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)})x_{j}^{(i)} ∂θj∂J(θ)=∂θj∂21i=1∑m(hθ(x)−y)2=21∂θj∂{ (hθ(x

【本文地址】

公司简介

联系我们

今日新闻

推荐新闻

专题文章