Homework 4 - Clustering
Due: Monday March 17, 2014 at 11:55pm
- (10.3 in book): Use an example to show why the k-means algorithm may not
find the global optimum, that is, optimizing the within-cluster variation.
- (10.6 in book): Both k-means and k-medroids algorithms can perform
effective clustering.
- Illustrate the strength and weakness of k-means in comparison with
k-medroids.
- Illustrate the strength and weakness of these schemes in comparison
with a hierarchical clustering scheme.
- (10.12 in book): Present conditions under which density-based clustering
is more suitable than partitioning-based clustering and hierarchical
clustering. Give application examples to support your argument.
- (10.15 in book, modified): Data cubes and multidimensional databases
contain nominal, ordinal, and numeric data in hierarchical or aggregate
forms. Discuss how you could use one of the clustering methods in either
Chapter 11 or Chapter 12 to find clusters in large data cubes, containing
a variety of data types, effectively and efficiently.
- (11.1 in book): Traditional clustering methods are rigid in that they
require each object to belong exclusively to only one cluster. Explain
why this is a special case of fuzzy clustering. You may use k-means as
an example.
- (10 pts)
Using the dataset for the k-nearest-neighbor problem on the previous
assignment, create a program to generate a k-nearest-neighbor graph
(as used by Chameleon) and use one of the visualization tools to
visualize the graph. Upload both an image file containing a sample
visualization and your code as your answer to this problem.
Upload your answers to the short answer questions as either a file or by
answering in the Moodle Notes field. Upload your source code for the
coding question as a file, with an extension that clearly indicates the
programming language you used (e.g. cpp for C++, pl for Perl, etc).