Identifies unique items, missing content, and cross-category learning opportunities based on similarity thresholds. Useful for gap analysis in policy documents, topic comparisons, or any cross-category similarity study.
Usage
analyze_similarity_gaps(
similarity_data,
ref_var = "ref_id",
other_var = "other_id",
similarity_var = "similarity",
category_var = "other_category",
ref_label_var = NULL,
other_label_var = NULL,
unique_threshold = 0.6,
cross_policy_min = 0.6,
cross_policy_max = 0.8
)Arguments
- similarity_data
A data frame with cross-category similarities, containing:
- ref_var
Reference item identifier
- other_var
Comparison item identifier
- similarity_var
Similarity score
- category_var
Category of comparison item
- ref_var
Name of column with reference item IDs (default: "ref_id").
- other_var
Name of column with comparison item IDs (default: "other_id").
- similarity_var
Name of column with similarity values (default: "similarity").
- category_var
Name of column with category information (default: "other_category").
- ref_label_var
Optional column with reference item labels (for output).
- other_label_var
Optional column with comparison item labels (for output).
- unique_threshold
Threshold below which reference items are considered unique (default: 0.6).
- cross_policy_min
Minimum similarity for cross-policy opportunities (default: 0.6).
- cross_policy_max
Maximum similarity for cross-policy opportunities (default: 0.8).
Value
A list containing:
- unique_items
Data frame of reference items with low similarity (unique content)
- missing_items
Data frame of comparison items with low similarity (content gaps)
- cross_policy
Data frame of items with moderate similarity (learning opportunities)
- summary_stats
Summary statistics by category
See also
Other semantic:
analyze_document_clustering(),
calculate_clustering_metrics(),
calculate_cross_similarity(),
calculate_document_similarity(),
calculate_similarity_robust(),
cluster_embeddings(),
cross_analysis_validation(),
export_document_clustering(),
extract_cross_category_similarities(),
fit_semantic_model(),
generate_cluster_labels(),
generate_cluster_labels_auto(),
generate_embeddings(),
reduce_dimensions(),
semantic_document_clustering(),
semantic_similarity_analysis(),
temporal_semantic_analysis(),
validate_cross_models()
Examples
if (FALSE) { # \dontrun{
# After extracting cross-category similarities
gap_analysis <- analyze_similarity_gaps(
similarity_data = cross_sims,
ref_var = "ref_id",
other_var = "other_id",
similarity_var = "similarity",
category_var = "other_category",
unique_threshold = 0.6
)
print(gap_analysis$unique_items)
print(gap_analysis$missing_items)
print(gap_analysis$summary_stats)
} # }
