Data Science
K-means Clustering
Cellular Network
The currently deployed wireless networks such as GSM, CDMA and LTE are known as cellular networks. In cellular network, the entire area is divided into smaller size cells to connect mobile subscribers with RF frequency to provide voice/data services. Each of these cells house one base station (i.e. BTS or eNodeB or eNB). (Link)
The base stations are interfaced together in different topologies viz. star, mesh etc. They are interfaced with MSCs, PSTN and PSDN in the backbone.
What is Cluster Analysis?
Detailed explaination.
Examples for different tpyes of Clusters.
What ideas can be borrowed from the cellular network?
K-means!!!
How to evaluate K-means?
Does K-means work for all?
Try to implement the algorithm and test with the following test
cases.
First two columns are the x and y coordinates, and the third
column is the group label
DataFile ClusterPlot
DataFile ClusterPlot
DataFile ClusterPlot
DataFile ClusterPlot
DataFile ClusterPlot
DataFile ClusterPlot
DataFile
ClusterPlot
DataFile
ClusterPlot
DataFile
ClusterPlot
DataFile
ClusterPlot
DataFile
ClusterPlot
Iris flower data set
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
Here is the Data.
Try to modify your program and solve this clustering problem.
Test your answer by
/home/fac/clei/checker/Kmeans/irisCheckerClustering
YourAns.txt
(SampleAns)
- Zachary's karate club
A social network of a karate club was studied by Wayne W. Zachary for a period of three years from 1970 to 1972. The network captures 34 members of a karate club, documenting links between pairs of members who interacted outside the club. During the study a conflict arose between the administrator "John A" and instructor "Mr. Hi" (pseudonyms), which led to the split of the club into two. Half of the members formed a new club around Mr. Hi; members from the other part found a new instructor or gave up karate.
Data Download (Matrix format)
Try to figure out a way to modify your program to handle this classic clustering problem. And display your result.
- Spotify Songs Dataset
Spotify is a Swedish audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 602 million monthly active users, including 236 million paying subscribers, as of December 2023.
Spotify offers digital copyright restricted recorded audio content, including more than 100 million songs and five million podcasts, from record labels and media companies.
Here
is a dataset of Spotify tracks over a range of 125 different
genres.
Each track has some audio
features associated with it. The data is in CSV format, with
total of 114,000 songs.
Can you arrange these 114,000 songs into some similar groups?
Modified K-means style Algorithms
Choosing better initial centroid estimates: K-means++, Intelligent K-Means, Genetic K-Means
Choosing different representative prototypes for the clusters: K-Medoids, K-Medians, K-Modes
Applying feature transformation techniques: Weighted K-Means, Kernel K-Means