Preceptron – A simple binary classifier

What is a preceptron?

Suppose we have a linearly separable two-category dataset with an input space $\mathcal{X}\subseteq\mathbb{R}^n$ and a output space $\mathcal{Y}=\{+1,-1\}$ , preceptron tries to find a hyper-plane that can completely seperate these two class (-1 and 1). Here is a intutive plot of it:

The hyper-plane cuts the dataset into two parts with the values of -1 and 1, that white vacancy is the imagined hyper-plane.

The Model

As we give the input and the output space above, the function for that reflection is:

 $f(x)=\mathrm{sign}(w\cdot x+b)$

Here the $\cdot$ stand for inner product of two vector, $w$ for weight vector and $b$ for bias.
Plus, the $\mathrm{sign}$ here is the sign function:

 $\begin{cases} &amp; +1,x\ge 0 \\ &amp; -1,x&lt; 0 \end{cases}$

For a given dataset T where $T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\}$ , preceptron model is to get the $w,b$ that satisfy the classfication requires. Then using the model to forecast any given data is avaliable.

The Strategy

Linear Separability

For a given dataset

 $T=\{(x_1,y_1),...(x_N,y_N)\}$

If it exists a hyper-plane that can completely divide dataset into positive and negative subsets, in which for all
$i\space\text{that}\space y_i=+1\space\text{s.t.}\space w\cdot x_i+b>0$ , we consider this a linearly separable data set, and vice versa.

The Loss Function

Distance

For a random point $x_0$ in the input space $\mathbb{R}^n$ , we have the distance to the given hyper-plane $S$ is

 $\frac{1}{||w||}|w\cdot x_0+b|$

For misclassified data, $-y_i(w\cdot x_i+b)>0$ , so the distance to the hyper-plane is

 $\frac{1}{||w||}y_i(w\cdot x_i+b)$

Let M be the set contains all the misclassified data, the total distance would be

 $\displaystyle -\frac{1}{||w||}\sum_{x_i\in M}y_i(w\cdot x_i+b)$

Dismiss $\frac{1}{||w||}$ , we have the loss function

 $\displaystyle L(w,b)=-\sum_{x_i\in M}y_i(w\cdot x_i+b)$

The Algoritm

The algoritm of preceptron is a optimization-algoritm, which is, calculating the parameters $w,b$ by minimizing the loss function

 $\displaystyle \min_{w,b}L(w,b)=-\sum_{x_i\in M}y_i(w\cdot x_i+b)$

Here we use stochastic gradient descent to find the minimum of the loss function

 $\displaystyle \nabla_wL(w,b)=-\sum_{x_i\in M}y_ix_i\\ \nabla_wL(w,b)=-\sum_{x_i\in M}y_i$

In this case the update of w and b would be

 $w\leftarrow w+\eta y_ix_i\\ b\leftarrow b+\eta y_i$

Where $\eta(0<\eta\leq1)$ is the stepsize (some would like to say learning rate).

The Dual Problem

In the primary problem, when the procedure stops, the final $w,b$ will be

 $\displaystyle w=\sum^{N}_{i=1}\alpha_i y_i x_i\\ \displaystyle b= \sum^{N}_{i=1}\alpha_i y_i$

So we can conclude a model like

 $\displaystyle f(x)=\mathrm{sign}(\sum_{j=1}^N \alpha_iy_jx_j\cdot x+b)\\ \mathrm{where}\quad \alpha = (\alpha_1,\alpha_2,...,\alpha_N)^T$

And the condition would be like

 $\displaystyle y_i(\sum^N_{j=1}\alpha_jy_jx_j\cdot x_i+b)\leq 0$

We only need the inner product of each data, and store them in a matrix call the Gram.

 $G=[x_i\cdot x_j]_{N\times N}$

浏览量: 24

Preceptron – A simple binary classifier

What is a preceptron?

The Model

The Strategy

Linear Separability

The Loss Function

Distance

The Algoritm

The Dual Problem

Bindow

发表回复取消回复

What is a preceptron?

The Model

The Strategy

Linear Separability

The Loss Function

Distance

The Algoritm

The Dual Problem

Bindow

发表回复 取消回复

发表回复取消回复