Multiple Features
- 有多個變數的Linear Regression稱為multivariate linear regression
- 符號說明如下:
- 其hypothesis function的形式為:
- 也可用matrix的形式來表示:
Gradient Descent For Multiple Variables
- 演算法如下:
\begin{align*} & \text{repeat until convergence:} \; \lbrace \newline \; & \theta_0 := \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_0^{(i)}\newline \; & \theta_1 := \theta_1 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_1^{(i)} \newline \; & \theta_2 := \theta_2 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_2^{(i)} \newline & \cdots \newline \rbrace \end{align*}
- Feature Scaling
- 當Feature的range很大時,Descent的速度會很慢
- Input Variable的理想range
- −1 ≤
x(i) ≤ 1 - 大約就好,太大太小都不好
- Mean normalization
- μi 是feature (i)的平均值
- si 是the range of values (max - min)或標準差 \begin{align*}x_i := \dfrac{x_i - \mu_i}{s_i}\end{align*}
Normal Equation
- 為了得到Cost Function J的最小值,可以針對各個θi做偏微分後設成0,就可以求出θ的最佳解如下:
- X, y的例子如下
- 使用Normal Equation就不需要做Feature Scaling
- Gradient Descent跟Normal Equation的比較如下:
Gradient Descent | Normal Equation |
---|---|
Need to choose alpha | No need to choose alpha |
Needs many iterations | No need to iterate |
O ( | O ( |
Works well when n is large | Slow if n is very large |
- 實作上當n超過10000的時候就會考慮用Gradient Descent
Normal Equation Noninvertibility
- 如果X^T X是不可逆的,可能的原因如下:
- 有重複的Feature出現 (如線性相關的兩個Feature(ft/m))
- Feature數量太多 (m≤ n)
- 刪除一些Feature或做Regularization
沒有留言:
張貼留言