Skip to contents

Identifies unique items, missing content, and cross-category learning opportunities based on similarity thresholds. Useful for gap analysis in policy documents, topic comparisons, or any cross-category similarity study.

Usage

analyze_similarity_gaps(
  similarity_data,
  ref_var = "ref_id",
  other_var = "other_id",
  similarity_var = "similarity",
  category_var = "other_category",
  ref_label_var = NULL,
  other_label_var = NULL,
  unique_threshold = 0.6,
  cross_policy_min = 0.6,
  cross_policy_max = 0.8
)

Arguments

similarity_data

A data frame with cross-category similarities, containing:

ref_var

Reference item identifier

other_var

Comparison item identifier

similarity_var

Similarity score

category_var

Category of comparison item

ref_var

Name of column with reference item IDs (default: "ref_id").

other_var

Name of column with comparison item IDs (default: "other_id").

similarity_var

Name of column with similarity values (default: "similarity").

category_var

Name of column with category information (default: "other_category").

ref_label_var

Optional column with reference item labels (for output).

other_label_var

Optional column with comparison item labels (for output).

unique_threshold

Threshold below which reference items are considered unique (default: 0.6).

cross_policy_min

Minimum similarity for cross-policy opportunities (default: 0.6).

cross_policy_max

Maximum similarity for cross-policy opportunities (default: 0.8).

Value

A list containing:

unique_items

Data frame of reference items with low similarity (unique content)

missing_items

Data frame of comparison items with low similarity (content gaps)

cross_policy

Data frame of items with moderate similarity (learning opportunities)

summary_stats

Summary statistics by category

Examples

if (FALSE) { # \dontrun{
# After extracting cross-category similarities
gap_analysis <- analyze_similarity_gaps(
  similarity_data = cross_sims,
  ref_var = "ref_id",
  other_var = "other_id",
  similarity_var = "similarity",
  category_var = "other_category",
  unique_threshold = 0.6
)

print(gap_analysis$unique_items)
print(gap_analysis$missing_items)
print(gap_analysis$summary_stats)
} # }