Chengwei LEI, Ph.D.    Associate Professor

Department of Computer and Electrical Engineering and Computer Science
California State University, Bakersfield

 

Measurements



Given two data points, we can calculate a value to represent the distance/similarity/dissimilarity
between those two data points.


 

Given a group of data,



There are n data points and each data point has p dimensions.

We can calculate a n by n matrix to represent the distance/similarity/dissimilarity of the entire dataset.

 

 

Here are some sample questions.




 

Distance Calculation

 

Distance measures quantify the degree of separation or distance between two data points in a specific metric space.
The focus here is on determining how far apart two points are in terms of their coordinates, features, or representations.
Distance measures are always non-negative and are usually symmetric (i.e., the distance from point A to point B is the same as the distance from point B to point A).

Distance Measures are typically associated with geometrical or numerical representations,
often used in vector spaces and metrics.

 

 

Euclidean distance
Standardized Euclidean distance
Manhattan distance
Chebyshev distance
Minkowski distance
 Hamming distance
Jaccard distance

 


 

(Dis)Similarity / Corrolation

(Dis)Similarity calculation also quantifies how (different) similar two data points are, but the term "(dis)similarity" often refers to a broader concept that includes both distance measures and other metrics that focus on different kinds of disparities between points. It is used more generally in clustering, classification, or other unsupervised learning algorithms to describe how "(un)like" two points are.

While distance measures are always non-negative, (dis)similarity can sometimes allow for asymmetric calculations
(i.e., (dis)similarity between A and B is not necessarily the same as between B and A).
In some cases, (dis)similarity can refer to any kind of calculation that reflects how (different) similar two objects are.

 

(Dis)Similarity Calculation is more general concept, which could involve distance measures, but also includes non-geometric measures (including those for categorical data or set-based data), reflecting the difference between data points in a broader sense.

 

 

 

Cosine Similarity
Jaccard Similarity
Pearson Correlation Coefficient
Spearman’s Rank Correlation
Distance Correlation
Gower's Distance
Kendall Rank Correlation Coefficient