
Calculate Cross-Matrix Cosine Similarity
Source:R/semantic_analysis.R
calculate_cross_similarity.RdCalculates cosine similarity between two different embedding matrices, useful for comparing documents/topics across different categories or groups.
Usage
calculate_cross_similarity(
embeddings1,
embeddings2,
labels1 = NULL,
labels2 = NULL,
normalize = TRUE
)Arguments
- embeddings1
A numeric matrix where rows are items and columns are embedding dimensions.
- embeddings2
A numeric matrix where rows are items and columns are embedding dimensions. Must have the same number of columns as embeddings1.
- labels1
Optional character vector of labels for items in embeddings1.
- labels2
Optional character vector of labels for items in embeddings2.
- normalize
Logical, whether to L2-normalize embeddings before computing similarity (default: TRUE).
Value
A list containing:
- similarity_matrix
Matrix of cosine similarities (nrow(embeddings1) x nrow(embeddings2))
- similarity_df
Long-format data frame with columns: row_idx, col_idx, similarity, and optionally label1, label2
See also
Other semantic:
analyze_document_clustering(),
analyze_similarity_gaps(),
calculate_clustering_metrics(),
calculate_document_similarity(),
calculate_similarity_robust(),
cluster_embeddings(),
cross_analysis_validation(),
export_document_clustering(),
extract_cross_category_similarities(),
fit_semantic_model(),
generate_cluster_labels(),
generate_cluster_labels_auto(),
generate_embeddings(),
reduce_dimensions(),
semantic_document_clustering(),
semantic_similarity_analysis(),
temporal_semantic_analysis(),
validate_cross_models()
Examples
if (FALSE) { # \dontrun{
# Generate embeddings for two groups
emb1 <- TextAnalysisR::generate_embeddings(c("text a", "text b"), verbose = FALSE)
emb2 <- TextAnalysisR::generate_embeddings(c("text c", "text d", "text e"), verbose = FALSE)
# Calculate cross-similarity
result <- calculate_cross_similarity(
emb1, emb2,
labels1 = c("A", "B"),
labels2 = c("C", "D", "E")
)
print(result$similarity_matrix)
print(result$similarity_df)
} # }