安東尼的筆記屋: Week3

Classification

Binary classification problem

y只考慮是0或1，即y∈{0,1}

0又稱為negative class，可用"-"表示
1又稱為positive class，可用"+"表示

Hypothesis Representation

將$h_\theta(x)$的形式改成如下形式

\begin{align*}0 \leq h_\theta (x) \leq 1 \end{align*}

新的形式稱為Sigmoid Function 或Logistic Function

\begin{align*}& h_\theta (x) = g ( \theta^T x ) \newline \newline& z = \theta^T x \newline& g(z) = \dfrac{1}{1 + e^{-z}}\end{align*}

下圖是Sigmoid Function的樣子

$h_\theta(x)$代表的是output為1的機率

假設$h_\theta(x)=0.7$，代表有70%的機率output會是1
而output是0的機率就是1-70%=30%

Decision Boundary

為了要把結果做0跟1的分類，可以把hypothesis function的output轉譯成如下：

\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align*}

從前面Sigmoid Function的圖可知當z > 0時，g(z) >= 0.5
若z為$\theta^T X$則代表：
所以
Decision Boundary就是用來區分y=0跟y=1區域的那條線

它不一定要直線，可以是圓形線或任何形狀

Cost Function

若Logistic Function使用Linear Regression的Cost Function，則會是波浪的形狀有很多Local Optima
Logistic Regression的Cost Function如下
當y=1時會得到$J(\theta)$跟$h_\theta (x)$的圖

當y=0時會得到$J(\theta)$跟$h_\theta (x)$的圖

整理如下：

Simplified Cost Function and Gradient Descent

我們可以把上面的Cost()整合成一條式子

$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$

整體的Cost Function如下表示：

$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$

用vector的方式實作如下：

Gradient Descent

Gradient Descent的表示式跟Linear Regression一樣
用vector的方式實作如下：

Advanced Optimization

除了Gradient Descent之外有其他更好的方法

Conjugate gradient
BFGS
L-BFGS

但是建議不要自己實做這些複雜的演算法，而是找已經最佳化過的Library
我們需要提供一個Function去計算下面兩個結果
首先寫一個單一Function回傳這兩個結果

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

接著用下面的function算出最佳解

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

Multiclass Classification: One-vs-all

當類別超過兩類時，定義 y = {0,1...n}
把這個問題分成n+1個Binary Classification Problem，將最大的結果當成預測值

Regularization

為了避免overfitting，可以在cost function中加上對weight的懲罰，所以cost function改寫如下：

$min_\theta\ \dfrac{1}{2m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2$

Regularized Linear Regression

Gradient Descent

演算法改寫如下，要注意的是並沒有對$\theta_0$做懲罰
在經過整理後如下：

$\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$
由於$1 - \alpha\frac{\lambda}{m}$ 永遠會小於1，而後面的項目跟沒做regularization前一樣，所以weight會減小的範圍就決定在$1 - \alpha\frac{\lambda}{m}$

Normal Equation

式子改寫如下：
前面提過若m<n則$X^TX$是不可逆的，但在改寫後$X^TX$ + λ⋅L是可逆的。

Regularized Logistic Regression

和Regularized Linear Regression的方法一樣

安東尼的筆記屋

2017年9月18日星期一

Week3

Classification

Hypothesis Representation

Decision Boundary

Cost Function

Simplified Cost Function and Gradient Descent

Gradient Descent

Advanced Optimization

Multiclass Classification: One-vs-all

Regularization

沒有留言:

張貼留言

熱門文章

2017年9月18日 星期一

Week3

Classification

Hypothesis Representation

Decision Boundary

Cost Function

Simplified Cost Function and Gradient Descent

Gradient Descent

Advanced Optimization

Multiclass Classification: One-vs-all

Regularization

沒有留言:

張貼留言

2017年9月18日星期一