Calculates common clustering evaluation metrics including Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index.
Arguments
- clusters
Integer vector of cluster assignments
- data_matrix
Numeric matrix of data points (rows = observations, cols = features)
- dist_matrix
Optional distance matrix. If NULL, computed from data_matrix
- metrics
Character vector of metrics to calculate. Options: "silhouette", "davies_bouldin", "calinski_harabasz", or "all" (default)
Value
A named list containing:
- silhouette
Silhouette score (-1 to 1, higher is better)
- davies_bouldin
Davies-Bouldin index (lower is better)
- calinski_harabasz
Calinski-Harabasz index (higher is better)
- n_clusters
Number of clusters
- cluster_sizes
Table of cluster sizes
Details
Silhouette Score: Measures how similar an object is to its own cluster compared to other clusters. Range: -1 to 1, higher is better.
Davies-Bouldin Index: Average similarity between each cluster and its most similar cluster. Lower values indicate better clustering.
Calinski-Harabasz Index: Ratio of between-cluster to within-cluster variance. Higher values indicate better-defined clusters.
See also
Other semantic:
analyze_document_clustering(),
analyze_similarity_gaps(),
calculate_cross_similarity(),
calculate_document_similarity(),
calculate_similarity_robust(),
cluster_embeddings(),
cross_analysis_validation(),
export_document_clustering(),
extract_cross_category_similarities(),
fit_semantic_model(),
generate_cluster_labels(),
generate_cluster_labels_auto(),
generate_embeddings(),
reduce_dimensions(),
semantic_document_clustering(),
semantic_similarity_analysis(),
temporal_semantic_analysis(),
validate_cross_models()
