模型學習

Fri, Feb 9, 2024
2-minute read

Elaborating on Model Learning.

本篇文章將介紹 ….

Input 與 Weight 如何在 Perceptron 計算

Activation Function

Loss Function

如何透過 Loss 修正學習

Perceptron

One Perceptron

針對一個 Perceptron 依照權重提取 feature 的分數前篇所述的公式，可以做以下的簡化: $$ \sum _{i=1}^m w_i \times x_i + b = \sum _{i=1}^m w_i \times x_i + b \times 1 = \sum _{i=0}^m w_i \times x_i \quad \text{, where } w_m=b, x_m=1 $$

Simplify by Dot Product: $$ \sum _{i=0}^m \mathbf{w_i} \cdot \mathbf{x_i} = \begin{bmatrix} w_0 & w_1 & \cdots & w_m \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_m \end{bmatrix} = \mathbf{w} \cdot \mathbf{x} = \mathbf{z} $$

Multiple Perceptrons

針對多個 Perceptrons 提取多個 feature 的公式。
也代表一個 Layer 的正向傳播(feed-forward)。

Simplify by Matrix:

$$ \begin{bmatrix} w_{00} & w_{01} & \cdots & w_{0m} \\ w_{10} & w_{11} & \cdots & w_{1m} \\ \vdots \\ w_{n0} & w_{n1} & \cdots & w_{nm} \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix} = \mathbf{w} \cdot \mathbf{x} = \begin{bmatrix} z_0 \\ z_1 \\ \vdots \\ z_n \end{bmatrix} $$ 如果有 m 個 Perceptrons，n 個 feature，通常會有 m*n + n 個 weight。
最後一個 n 其值為 bias 的數量 (每個 Perceptron 皆有一個 bias)。

Consideration

Why can we include input data such as photos, texts, and voices in the neural network?

只要能夠將圖片、聲音、文字轉成向量形式，就可以使用神經網路開始訓練。

Activation Function

Step function 屬於 non-linear function，但是其 derivative 並不適合訓練。即代表回饋也會是 0。

Sigmoid,Tanh,ReLU 屬於 linear function，其 derivative 適合訓練。

Step function

$$ f(x) = \begin{cases} 0 & \text{if } x \leq 0 \\ 1 & \text{if } x > 0 \end{cases} $$

Sigmoid

$$ \begin{aligned} f(x) = \frac{1}{1 + e^{-x}} \end{aligned} $$

Tanh

$$ f(x) = \tanh(x) $$

ReLU

$$ f(x)=\max\left(0,x\right) $$

其他:

Leaky-ReLU
PReLU
ELU
Maxout

Softmax Fuction

用於 Classification。Softmax 可以拆解為 Soft + Max 的意思 Softmax 相對於是 Hardmax，細節後續會做解釋。目的是將 Last layer 得出的分數轉換為 Probability，使其值範圍介於 0~1 之間。

Formula

$$ \sigma(\mathbf{z})_j = \tfrac{e^{z_j}}{\sum _{k=1}^K e^{z_j}} \quad \text{ , for } j = 1,2 \cdots K $$

K: the number of categories

Prediction

經過 Activation Function 之後即可取得預測值以及誤差。並且把 error back-forward 給 Neural Network

Loss Function

在說 Loss 之前，我們需要先認識 MSE,RMSE。

Mean Square Error: $$ MSE = \tfrac{1}{n} \sum _{i=1}^n (\mathbf{x_i} - \mathbf{\hat{x}}) $$

我們可以使用 MSE 畫出 Loss Curve ，並且在學習過程中找出最小值。

Root Mean Square Error: $$ RMSE = \sqrt{MSE} $$

Mean Absolute Error:
$$ MAE = \tfrac{1}{n} \sum _{i=1}^n \left|(\mathbf{x_i} - \mathbf{\hat{x}}) \right| $$

Cross-Entropy Loss:
此為分類問題常用之 Loss 函數。

選用

如果需要更加關注預測誤差的整體分佈情況，或者希望更加準確地評估模型的性能，則可以選擇 MSE。
如果數據集中存在異常值，並且不希望這些異常值對評估結果產生較大影響，則可以選擇 MAE。

MSE 對於大誤差更加敏感，因為它將每個預測誤差平方後求平均。代表 RMSE 也會對大誤差更加關注。
MAE 對異常值的 Robustness 較好，因為它計算的是預測誤差的絕對值的平均值。
RMSE 結合 MSE 和 MAE 的優點，同時也考慮量級的影響。這在實際應用中可成為較全面的指標，尤其是在需要綜合考慮模型預測準確度以及對異常值的敏感性。

專有名詞

loss/residual/error

Loss
Residual
Error

Gradient Descent

Learning Rate

Learning Rate 的大小會影響到學習速度，調整適當可以避免錯過最低 loss 值。但也不能調整過低導致學習緩慢。

Basic Know How

Partial Derivative
Vector, Matrix Calculating
Chart (desmos)
Formula writing by Katex