2017年9月18日 星期一

Week2

Multiple Features

  • 有多個變數的Linear Regression稱為multivariate linear regression
  • 符號說明如下:
\begin{align*}x_j^{(i)} &= \text{value of feature } j \text{ in the }i^{th}\text{ training example} \newline x^{(i)}& = \text{the input (features) of the }i^{th}\text{ training example} \newline m &= \text{the number of training examples} \newline n &= \text{the number of features} \end{align*}
  • 其hypothesis function的形式為:
\begin{align*}h_\theta (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \cdots + \theta_n x_n\end{align*}
  • 也可用matrix的形式來表示:
\begin{align*}h_\theta(x) =\begin{bmatrix}\theta_0 \hspace{2em} \theta_1 \hspace{2em} ... \hspace{2em} \theta_n\end{bmatrix}\begin{bmatrix}x_0 \newline x_1 \newline \vdots \newline x_n\end{bmatrix}= \theta^T x\end{align*}


Gradient Descent For Multiple Variables

  • 演算法如下:

\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; & \theta_0 := \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_0^{(i)}\newline \; & \theta_1 := \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_1^{(i)} \newline \; & \theta_2 := \theta_2 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_2^{(i)} \newline & \cdots \newline \rbrace \end{align*}


  • Feature Scaling
    • 當Feature的range很大時,Descent的速度會很慢
    • Input Variable的理想range
      • −1 ≤ x(i) ≤ 1
      • 大約就好,太大太小都不好
    • Mean normalization
      • μi 是feature (i)的平均值
      • si 是the range of values (max - min)或標準差
      • \begin{align*}x_i := \dfrac{x_i - \mu_i}{s_i}\end{align*}

Normal Equation

  • 為了得到Cost Function J的最小值,可以針對各個θi做偏微分後設成0,就可以求出θ的最佳解如下:
\begin{align*}\theta = (X^T X)^{-1}X^T y\end{align*}


  • X, y的例子如下


  • 使用Normal Equation就不需要做Feature Scaling
  • Gradient Descent跟Normal Equation的比較如下:

Gradient DescentNormal Equation
Need to choose alphaNo need to choose alpha
Needs many iterationsNo need to iterate
O (kn2)O (n3), need to calculate inverse of XTX
Works well when n is largeSlow if n is very large

  • 實作上當n超過10000的時候就會考慮用Gradient Descent

Normal Equation Noninvertibility

  • 如果X^T X是不可逆的,可能的原因如下:
    • 有重複的Feature出現 (如線性相關的兩個Feature(ft/m))
    • Feature數量太多 (m≤ n)
      • 刪除一些Feature或做Regularization







沒有留言:

張貼留言