Easy! Just use the existing data for prediction.
How?
Lazy classifier: only does work when classifying
Parametric: must choose a 'k' parameter
In the limit of fully-sampled data, can be fairly accurate
Training: N/A
Classification:
For each new observation to classify
Note: distance is generally Euclidean
Speed of classification
Choosing K
Noisy data
Nonuniform samples
Submit a pull request on Github for kNN with exhaustive search
kNN isn't lazy enough: O(n) work is for chumps!
Let's organize our data into a tree to get O(log n) lookups
Constructing a KD-tree
numDimensions : comes from dataset
def buildTree(points, depth):
axis = depth % numDimensions
pointSort = sorted([{points[i][axis]:i} for i in range(len(points))])
splitIdx = pointSort[int(len(points)/2)]
tree = {}
tree['location'] = points[splitIdx]
tree['left'] = buildTree( points[:(splitIdx-1)], depth + 1 )
tree['right'] = buildTree( points[splitIdx:], depth + 1 )
return tree
Quick demo of real-time KD-trees (brevis.example.swarm.neighborhood-line-swarm)
What if our data isn't well-sampled?
What if we have to classify an alien (presumably we haven't seen those before)?
Posted in the assignments section
Quick Weka recap
Due: Jan 27
Posted in the assignments section
Due: Feb 03
Decision Trees