Calculates document similarity with fallback methods and diagnostics. Attempts embeddings first, falls back to Jaccard similarity if needed.
Usage
calculate_similarity_robust(
texts,
method = "embeddings",
embedding_model = "all-MiniLM-L6-v2",
cache_embeddings = TRUE,
min_word_length = 3,
doc_names = NULL
)Arguments
- texts
Character vector of texts
- method
Similarity method ("embeddings" or "jaccard")
- embedding_model
Model name for embeddings (default: "all-MiniLM-L6-v2")
- cache_embeddings
Logical, cache embeddings (default: TRUE)
- min_word_length
Minimum word length for Jaccard (default: 3)
- doc_names
Optional document names
See also
Other semantic:
analyze_document_clustering(),
analyze_similarity_gaps(),
calculate_clustering_metrics(),
calculate_cross_similarity(),
calculate_document_similarity(),
cluster_embeddings(),
cross_analysis_validation(),
export_document_clustering(),
extract_cross_category_similarities(),
fit_semantic_model(),
generate_cluster_labels(),
generate_cluster_labels_auto(),
generate_embeddings(),
reduce_dimensions(),
semantic_document_clustering(),
semantic_similarity_analysis(),
temporal_semantic_analysis(),
validate_cross_models(),
word_co_occurrence_network(),
word_correlation_network()
