Skip to contents

Calculates common clustering evaluation metrics including Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index.

Usage

calculate_clustering_metrics(
  clusters,
  data_matrix,
  dist_matrix = NULL,
  metrics = "all"
)

Arguments

clusters

Integer vector of cluster assignments

data_matrix

Numeric matrix of data points (rows = observations, cols = features)

dist_matrix

Optional distance matrix. If NULL, computed from data_matrix

metrics

Character vector of metrics to calculate. Options: "silhouette", "davies_bouldin", "calinski_harabasz", or "all" (default)

Value

A named list containing:

silhouette

Silhouette score (-1 to 1, higher is better)

davies_bouldin

Davies-Bouldin index (lower is better)

calinski_harabasz

Calinski-Harabasz index (higher is better)

n_clusters

Number of clusters

cluster_sizes

Table of cluster sizes

Details

  • Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters. Range: -1 to 1, higher is better.

  • Davies-Bouldin Index: Average similarity between each cluster and its most similar cluster. Lower values indicate better clustering.

  • Calinski-Harabasz Index: Ratio of between-cluster to within-cluster variance. Higher values indicate better-defined clusters.

Examples

if (FALSE) { # \dontrun{
# Generate sample data
set.seed(123)
data <- rbind(
  matrix(rnorm(100, mean = 0), ncol = 2),
  matrix(rnorm(100, mean = 3), ncol = 2)
)
clusters <- c(rep(1, 50), rep(2, 50))

metrics <- calculate_clustering_metrics(clusters, data)
print(metrics)
} # }