Kyle I S Harrington / kyle@eecs.tufts.edu
Some slides adapted from Roni Khardon
Machine learning on unlabeled data
Techniques include: clustering, expectation-maximization, some signal processing (PCA), etc.
Grouping instances/objects into sets/clusters based on similarity
Partition data into clusters $C_1,...,C_k$
Centroid of cluster $j$: $\mu_j = \frac{1}{|C_j|} \displaystyle \sum_{x \in C_j} x$
Centroid of dataset: $\mu = \frac{1}{N} \displaystyle \sum_j \displaystyle \sum_{x \in C_j} x$
What makes a good cluster?
What makes a good cluster?
For a given clustering, what is the average variance from centroid for all clusters?
$CS = \displaystyle \sum_j \displaystyle \sum_{x \in C_j} || x - \mu_j ||^2$
![]() |
![]() |
Low | High |
How distinct are cluster centroids from the centroid of the entire dataset?
$CS = \displaystyle \sum_j | C_j | \cdot || \mu_j - \mu ||^2$
![]() |
![]() |
Low | High |
What is the shortest distance between instances of two clusters?
$Spacing = min_{i,j} [ min_{x \in C_i, y \in C_j} || x - y ||^2]$
![]() |
![]() |
Low | High |
How do we determine "most similar"?
When are we done?
Distance functions for clusters:
$d_{min} (C_i, C_j) = \displaystyle min_{x \in C_i, y \in C_j} || x - y ||^2$
$d_{max} (C_i, C_j) = \displaystyle max_{x \in C_i, y \in C_j} || x - y ||^2$
$d_{avg} (C_i, C_j) = \frac{1}{|C_i| \cdot |C_j|} \displaystyle \sum_{x \in C_i} \displaystyle \sum_{y \in C_j} || x - y ||^2$
The algorithm yields a sequence of clusterings.
The user decides which clustering is preferred (num clusters, cluster variance, etc.)
How do we determine the "best split"?
Speed
Partition the data into $k$ clusters
Iteratively update cluster assignments using centroids
Run with different values of k, and measure quality
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Sensitivity to initial conditions
Sensitivity to outliers
Sensitivity to initial conditions: repeat with multiple initial conditions
Sensitivity to outliers: use median instead of mean
From Vinh, Epps and Bailey, 2010
From Vinh, Epps and Bailey, 2010
Using the mutual information directly is sensitive to the number of clusters, therefore the normalized mutual information must be calculated by dividing by entropy
From Vinh, Epps and Bailey, 2010
Posted here
Due: March 15 (hardcopy in class)
Machine learning in games
Go!
Suggested reading: Mastering the game of Go with deep neural networks and tree search