Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield


Data Mining



=====>!!!Do not use any "magic function" in this class!!!<=====



Introduction to Data Science


Interesting Examples


Day and Night is a television series directed by Wang Wei and written by Zhiwen. The series tells the story of Guan Hongfeng, the former captain of the Changfeng Criminal Investigation Detachment, who solves many cases to get his brother Guan Hongyu exonerated.

Guan Hongfeng, a former police captain suffering from nyctophobia, returns to solving mysteries alongside the hot-tempered Captain Zhou Xun and rookie officer Zhou Shutong. However, he has a hidden agenda, which is to clear his identical twin brother Guan Hongyu's name from the alleged murder of an entire family.

Assume you are Captain Zhou Xun, with a Computer Science Ph.D background. :)
Can you figure out their identities based on the behavior patterns? How?


Hint: Psychology Professor style or Math/CS Professor style?


Spotify is a Swedish audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 602 million monthly active users, including 236 million paying subscribers, as of December 2023.

Spotify offers digital copyright restricted recorded audio content, including more than 100 million songs and five million podcasts, from record labels and media companies.


Here is a dataset of Spotify tracks over a range of 125 different genres.
Each track has some audio features associated with it. The data is in CSV format, with total of 114,000 songs.

Can you arrange these 114,000 songs into some similar groups? 

Here is the public dataset drawn from the U.S. Army Anthropometric Survey  form University of Michigan

We have some data sheets in our office, but they are ruined by rats.

Rats Are Eating Files Along With Food Scraps In East Delhi Municipal  Headquarter 

Try to write a program to fix the following broken dataset


I split the data for your convenience


Test your answer (Sample Answer) by
/home/fac/clei/checker/KNN/armyChecker1 YourAnsForData1.txt



Hint: "I can do it too" or "How can I do it" ?