Homework 1 - Chapters 1 and 2
Due: Tuesday January 21, 2014 at 11:55pm
Questions are (mostly) from the book:
- 1.4: Present an example where data mining is crucial to the success
    of a business. What data mining functionalities does this 
    business need (e.g. think of the kinds of patterns that could be
    mined)? Can such patterns be generated alternatively by data query
    processing or simple statistical analysis?
- 1.6: Based on your observations, describe another possible kind of
    knowledge that needs to be discovered via data mining methods but
    has not been listed in this chapter. Does it require a mining
    methodology that is quite different from those outlined in this
    chapter?
- 1.9: What are the major challenges of mining a huge amount of data
    (e.g. billions of tuples) in comparison with mining a small amount
    of data (e.g. a data set of a few hundred tuples)?
- 1.10: Outline the major research challenges of data mining in one
    specific application domain, such as stream/sensor data analysis, 
    spatiotemporal data, or bioinformatics.
- 2.5: Briefly outline how to compute the dissimilarity between objects
    described by the following:
    
    - Nominal attributes
    
- Asymmetric binary attributes
    
- Numeric attributes
    
- Term-frequency vectors
    
 
- 2.6: Given two objects represented by the tuples (22, 1, 42, 10)
    and (20, 0, 36, 8), compute the distance between the two objects
    using:
    
    - Euclidean distance
    
- Manhattan distance
    
- Minkowski distance using q=3
    
- supremum distance
    
 
- Ch 2: Give an example of a data set that can not be visualized with a
    2D or 3D scatter plot. What method from this chapter would you use to
    visualize this data set?