Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield


Data Science


Performance Evaluation


Regression Problems  /  Classification Problems  /  Clustering Problems

Clustering Problems

Internal Evaluation / Without "Ground Truth Information" / Unsupervised


The smaller the BetaCV ratio, the better the clustering.


Wmin(Nin) be the sum of the smallest Nin distances in the proximity matrix W,
where Nin is the total number of intracluster edges

The smaller the C-index, the better the clustering.

The C-index lies in the range [0,1].


The smaller the modularity measure the better the clustering.

Normalized Cut:

The higher normalized cut value, the better the clustering.


Dunn Index

Davies-Bouldin Index


External Evaluation / With "Ground Truth Information" / Cross Dataset / Supervised


Demo Calculator

Maximum Matching:


Only one cluster can match with a given partition


Demo Calculator


Pairwise Based:


Jaccard Coefficient

Rand Statistic / Rand Index

Demo Calculator

The higher the Rand index, the better the clustering.

The Rand Index lies in the range [0,1].


Conditional Entropy:


Normalized Mutual Information:

The NMI value lies in the range [0,1]. The higher the NMI value, the better the clustering.



Relative Evaluation / Comparing different parameters

Silhouette Coefficient

Calinski–Harabasz Index