
Extract Cross-Category Similarities from Full Similarity Matrix
Source:R/semantic_analysis.R
extract_cross_category_similarities.RdGiven a full similarity matrix and category information, extracts pairwise similarities between a reference category and other categories into a long-format data frame suitable for visualization and analysis.
Usage
extract_cross_category_similarities(
similarity_matrix,
docs_data,
reference_category,
compare_categories = NULL,
category_var = "category",
id_var = "display_name",
name_var = NULL
)Arguments
- similarity_matrix
A square similarity matrix (n x n).
- docs_data
A data frame containing document metadata with at least:
- category_var
Column indicating category membership
- id_var
Column with unique document identifiers
- reference_category
Character string specifying the reference category to compare against.
- compare_categories
Character vector of categories to compare with the reference. If NULL, compares with all categories except reference.
- category_var
Name of the column containing category information (default: "category").
- id_var
Name of the column containing document IDs (default: "display_name").
- name_var
Optional name of column with display names (default: NULL, uses id_var).
Value
A data frame with columns:
- ref_id
Reference document ID
- ref_name
Reference document name (if name_var provided)
- other_id
Comparison document ID
- other_name
Comparison document name (if name_var provided)
- other_category
Category of comparison document
- similarity
Cosine similarity value
See also
Other semantic:
analyze_document_clustering(),
analyze_similarity_gaps(),
calculate_clustering_metrics(),
calculate_cross_similarity(),
calculate_document_similarity(),
calculate_similarity_robust(),
cluster_embeddings(),
cross_analysis_validation(),
export_document_clustering(),
fit_semantic_model(),
generate_cluster_labels(),
generate_cluster_labels_auto(),
generate_embeddings(),
reduce_dimensions(),
semantic_document_clustering(),
semantic_similarity_analysis(),
temporal_semantic_analysis(),
validate_cross_models()
Examples
if (FALSE) { # \dontrun{
# After calculating full similarity matrix
similarity_result <- TextAnalysisR::calculate_document_similarity(
texts = docs$text,
document_feature_type = "embeddings"
)
cross_sims <- extract_cross_category_similarities(
similarity_matrix = similarity_result$similarity_matrix,
docs_data = docs,
reference_category = "SLD",
compare_categories = c("Other Disability", "General"),
category_var = "category",
id_var = "display_name",
name_var = "doc_name"
)
} # }