Kyle I S Harrington / kyle@eecs.tufts.edu
Hebb's learning rule: fire together, wire together
$\Delta w_{kj} = \eta x_j y_k$
change weight $i$ proportinally to the product of the input and the output
$\Delta w_{kj} = \eta x_j y_k$
Problems?
How do we find weights that can produce a particular output?
$\Delta w_{kj} = \eta ( t_k - o_k ) i_j$
start with some $w_{(0)}$
do:
$w^{(t+1)} = w^{(t)} - \eta_t \nabla F(w^{(t)})$
until $||w^{(t+1)}-w^{(t)}|| \leq \epsilon$
Given a stream of input/output samples
Update weights after each sample (online learning)
Notebook!
Predict a binary class, $y$, given a feature value, $x$
$P(y=1|x) > P(y=0|x)$
Rewrite $P(y=1|x) > P(y=0|x)$
As log-odds: $log \frac{P(y=1|\textbf{x})}{P(y=0|\textbf{x})} > 0$
$P(y|\textbf{x})$ is generally either unknown or estimated from samples
Linear approximation $log \frac{P(y=1|\textbf{x})}{P(y=0|\textbf{x})}$ as
$f(x;w) = w_o + x \cdot w_1$
$log \frac{P(y=1|\textbf{x})}{P(y=0|\textbf{x})} = w_o + x \cdot w_1$
Instead we use
$P(y=1|x_i, w_{0,i} , w_{1,i} ) = g( w_{0,i} + x \cdot w_{1,i} )$
$g(z) = \frac{1}{(1+e^{-z})}$
"Squishing function"
$f(x) = \frac{1}{1+e^{-x}}$
Learn the weights with a penalty, $w$:
$\text{argmax}_{w} \displaystyle \sum^m_{i=1} P(y_i=1|x_i, w ) - \alpha R(w) $
Regularization term, $R(w)$, forces parameters to be small when $\alpha>0$
L1: $R(w) = ||w||_1 = \displaystyle \sum^n_{i=1} |w_i|$
L2: $R(w) = ||w||_2 = \displaystyle \sum^n_{i=1} w_i^2$
L2 tends to be better at shrinking weightsDue March 7 (Monday)
Clustering