introduction

classification of ML

input:vector x,output:probability of sampling x. The shape of the function determines by mean u and vavariance matrix ∑.
we have class one feature:x1,x2,x3,....xn,we assume x1,x2,x3,....xn generate from the Gaussion function(u^,∑^).
$$
L(\mu,\Sigma)= \prod f_{\mu,\Sigma}(x_i)
$$
$$
\hat{\mu},\hat{\Sigma}=argmax(L(\mu,\Sigma))
$$
result:
$$
\hat{\mu}=\frac{1}{n}\sum (x^n-\hat{\mu})(x^n-\hat{\mu})^T
$$
$$
\hat{\Sigma}=\frac{1}{n}
$$
simplify the Bayes formula:
$$
P(C_1|x)=\frac{P(x|C_1)P(C_1)}{P(x|C_1)P(C_1)+P(x|C_2)P(C_2)}=\frac{1}{1+\frac{P(x|C_2)P(C_2)}{P(x|C_1)P(C_1)}}
$$
$$
z=ln\frac{P(x|C_1)P(C_1)}{P(x|C_2)P(C_2)}
$$
so,the final formula as follow:
$$
z=wx+b
$$
$$
P(C_1|x)=\frac{1}{1+exp(-z)}=\sigma(z)
$$
the Logistic Regression formula:

$$
f_{w,b}(x)=\sigma(\sum x_iw_i+b)
$$
how to do gradient descent for Logistic Regression?

x1 x2 x3 xn
C1 C1 C2 C1

Assume the data is generated based on $$f_{w,b}(x)=P(C_1|x) $$,the Loss function as follow:
$$
Loss(w,b)=f(x_1)f(x_2)(1-f(x_3))f(x_n)
$$
$$
\hat{w},\hat{b}=argmaxLoss(w,b)=argmin-lnLoss(w,b)
$$
$$
-lnLoss(w,b)=\sum -[\hat{y}lnf(x^n)+(1-\hat{y})lnf(x^n)]
$$

the loss function final is a bernoulli distribution
$$
\frac{\partial -lnLoss(w,b)}{\partial x_i}=\sum -(\hat{y^n}-f(x^n))x_i^n
$$

multi-class classification

multi-class classification will use the softmax function:
$$
y_i=e^z_i/ \sum e^z_i
$$
cascading Logistic Regression models become Neural Network
               
The function of a layer of logistic regression is to transform the data into features.