Data Science
Performance Evaluation
Regression Problems / Classification Problems / Clustering Problems
Clustering Problems
Internal Evaluation / Without "Ground Truth Information" / Unsupervised
BetaCV:
The smaller the BetaCV ratio, the better the clustering.
C-index:
Wmin(Nin) be the sum of the smallest Nin
distances in the proximity matrix W,
where Nin is the total number of intracluster edges
The smaller the C-index, the better the clustering.
The C-index lies in the range [0,1].
Modularity:
The smaller the modularity measure the better the clustering.
Normalized Cut:
The higher normalized cut value, the better the clustering.
Dunn Index
Davies-Bouldin Index
External Evaluation / With "Ground Truth Information" / Cross Dataset / Supervised
Purity:
Maximum Matching:
Only one cluster can match with a given partition
F-measure:
Pairwise Based:
Jaccard Coefficient
Rand Statistic / Rand Index
The higher the Rand index, the better the clustering.
The Rand Index lies in the range [0,1].
Conditional Entropy:
Normalized Mutual Information:
The NMI value lies in the range [0,1]. The higher the NMI value, the better the clustering.
Relative Evaluation / Comparing different parameters