Calculate Cross-Matrix Cosine Similarity

Calculates cosine similarity between two different embedding matrices, useful for comparing documents/topics across different categories or groups.

Usage

calculate_cross_similarity(
  embeddings1,
  embeddings2,
  labels1 = NULL,
  labels2 = NULL,
  normalize = TRUE
)

Arguments

embeddings1: A numeric matrix where rows are items and columns are embedding dimensions.
embeddings2: A numeric matrix where rows are items and columns are embedding dimensions. Must have the same number of columns as embeddings1.
labels1: Optional character vector of labels for items in embeddings1.
labels2: Optional character vector of labels for items in embeddings2.
normalize: Logical, whether to L2-normalize embeddings before computing similarity (default: TRUE).

Value

A list containing:

similarity_matrix: Matrix of cosine similarities (nrow(embeddings1) x nrow(embeddings2))
similarity_df: Long-format data frame with columns: row_idx, col_idx, similarity, and optionally label1, label2

Examples

if (FALSE) { # \dontrun{
data(SpecialEduTech)
# Generate embeddings for two groups
emb1 <- TextAnalysisR::generate_embeddings(SpecialEduTech$abstract[1:3], verbose = FALSE)
emb2 <- TextAnalysisR::generate_embeddings(SpecialEduTech$abstract[4:6], verbose = FALSE)

# Calculate cross-similarity
result <- calculate_cross_similarity(
  emb1, emb2,
  labels1 = SpecialEduTech$title[1:3],
  labels2 = SpecialEduTech$title[4:6]
)
print(result$similarity_matrix)
print(result$similarity_df)
} # }

Usage

Arguments

Value

See also

Examples