Introduction to Data Mining

 1. Short-answer questions (10 points each) 
a. Briefly describe why clusteirng is one kind of unsupervised learning 
b. Briefly describe how a K-means clustering works 
c. Briefly describe the main difference between K-means and K-medoid methods

d. In data mining, one of the fields is outlier analysis. Explain what is an outlier? Are outliers noise data? 

e. A good clustering method will produce high quality clusters. What criteria can we use to judge where clusters are high quality clusters?
 
f. List out at least two drawbacks of K-means clustering approach

g. In hierarchical clustering, there are different ways to measure the distances between clusters, e.g. single linkage, complete linkage, and average linkage. Briefly describe the difference among these three distance measures. 

2. Given the following distance matrix of four data points 1, 2, 3, and 4: (Requirement: Report all the partial trees and matrices for the intermediate steps.)  

Perform hierarchical clustering using single-linkage, complete linkage, and average linkage similarity measures (30 points); 

Order your essay today and save 30% with the discount code: KIWI20