Calculates document similarity with fallback methods and diagnostics. Attempts embeddings first, falls back to Jaccard similarity if needed.
Usage
calculate_similarity_robust(
texts,
method = "embeddings",
embedding_model = "all-MiniLM-L6-v2",
cache_embeddings = TRUE,
min_word_length = 3,
doc_names = NULL
)Arguments
- texts
Character vector of texts
- method
Similarity method ("embeddings" or "jaccard")
- embedding_model
Model name for embeddings (default: "all-MiniLM-L6-v2")
- cache_embeddings
Logical, cache embeddings (default: TRUE)
- min_word_length
Minimum word length for Jaccard (default: 3)
- doc_names
Optional document names
