Data Science
K-Nearest Neighbors algorithm
“You are who you associate with. Look around at your five
closest friends and that’s who you are. If you don’t want to be
that person, you know what you gotta do.”
— Will Smith
“When I see a bird that walks like a duck and swims like a
duck and quacks like a duck, I call that bird a duck.”
— Indiana poet
James Whitcomb Riley
k-NN classification
The output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors.
Iris flower data set
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
Here is the
KnownData; and make your prediction for these
UnknownData.
Test your answer by /home/fac/clei/checker/KNN/irisChecker20
YourAns.txt
(SampleAns)
-
K
-
1 (find the observation that is closest/the nearest neighbor)
-
5 (As Will Smith told us ^_^ )
-
-
Nearest
-
Euclidean Distance
-
d(xi, xj)= sqrt (sum for r=1 to n (ar(xi) - ar(xj))^2)
How to determine best "K" and best "Nearest" ?
Exhaust all the possibilities!
Here is the public dataset drawn from the U.S. Army Anthropometric Survey form University of Michigan
Try to use your program on the following 2 datasets
KnownData1
KnownData2
UnknownData1
UnknownData2
Test your answer (Sample
Answer) by
/home/fac/clei/checker/KNN/armyChecker1 YourAnsForData1.txt
/home/fac/clei/checker/KNN/armyChecker2 YourAnsForData2.txt
k-NN regression
The output is the property value for the object. This value is the average of the values of k nearest neighbors.
Airbnb is a internet marketplace for short-term home and apartment rentals. It allows you to, for example, rent out your home for a week while you’re away, or rent out your spare bedroom to travelers.
Airbnb doesn’t release any data on the listings in its marketplace, a but separate group named Inside Airbnb has extracted data on a sample of the listings for many of the major cities on the website.
Here is a example for Amsterdam price
data
KnownData
UnknownData
Cal the
SSE of your answer (SampleAns)
by
/home/fac/clei/checker/KNN/airbnbChecker YourAns.txt
Let's try to use KNN idea to solve the Handwriting recognition problem.
In our test case, we will use the 32by32 pixel png format picture (Example)
Generate your own handwriting digits (Here is mine)
Use this python code to convert it into matrix format. (Here is mine)
Change the very first digit in the matrix as the Label (example)
Now, try to use the KNN method to build a classifier to
recognize what digit the input (matrix format) picture
represents.