Data Science
Decision Tree Classification
The Pretender (YoutubeLink) is an American action drama television series created by Steven Long Mitchell and Craig W. Van Sickle, that aired on NBC from September 19, 1996 to May 13, 2000.
The series follows Jarod, a young man on the run who is a "Pretender": a genius impostor able to quickly master the complex skill sets necessary to impersonate a member of any profession.
In first season, episode 1, Jarod pretends to be a doctor to avenge a boy crippled by faulty surgery.
YoutubeLink (0:32~0:38)
If you are Jarod in S1E1, want to pretend to be a doctor. What to do?
One example from
the Department of Health & Human Services
Another example from
The most widely read and highly cited peer-reviewed neurology
journal
How can we build something similar?
How to build an better decision tree?
Information Theory (Entropy)!
Build the tree based on the Information Gain.
Iris flower data set
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
Here is the TrainingData and TestingData.
Try to build different decision trees for the training data.
-
Try to build a “perfect” decision tree based on the training data, and see if it has over fitting problem
-
Try to build an “over-pruning” decision tree and test the accuracy on the testing data.
-
Find a good threshold to prune the decision, test the performance on the testing data. Compare the result with two methods above.
Test your answer by /home/fac/clei/checker/DT/irisChecker50
YourAns.txt
(SampleAns)