Skip to contents

Calculates cosine similarity between two different embedding matrices, useful for comparing documents/topics across different categories or groups.

Usage

calculate_cross_similarity(
  embeddings1,
  embeddings2,
  labels1 = NULL,
  labels2 = NULL,
  normalize = TRUE
)

Arguments

embeddings1

A numeric matrix where rows are items and columns are embedding dimensions.

embeddings2

A numeric matrix where rows are items and columns are embedding dimensions. Must have the same number of columns as embeddings1.

labels1

Optional character vector of labels for items in embeddings1.

labels2

Optional character vector of labels for items in embeddings2.

normalize

Logical, whether to L2-normalize embeddings before computing similarity (default: TRUE).

Value

A list containing:

similarity_matrix

Matrix of cosine similarities (nrow(embeddings1) x nrow(embeddings2))

similarity_df

Long-format data frame with columns: row_idx, col_idx, similarity, and optionally label1, label2

Examples

if (FALSE) { # \dontrun{
# Generate embeddings for two groups
emb1 <- TextAnalysisR::generate_embeddings(c("text a", "text b"), verbose = FALSE)
emb2 <- TextAnalysisR::generate_embeddings(c("text c", "text d", "text e"), verbose = FALSE)

# Calculate cross-similarity
result <- calculate_cross_similarity(
  emb1, emb2,
  labels1 = c("A", "B"),
  labels2 = c("C", "D", "E")
)
print(result$similarity_matrix)
print(result$similarity_df)
} # }