Skip to contents

Evaluates the stability of a hybrid topic model by running bootstrap resampling. This helps identify which topics are robust and which may be artifacts of the specific sample. Based on research recommendations for topic model validation.

Usage

assess_hybrid_stability(
  texts,
  n_topics = 10,
  n_bootstrap = 5,
  sample_proportion = 0.8,
  embedding_model = "all-MiniLM-L6-v2",
  seed = 123,
  verbose = TRUE
)

Arguments

texts

A character vector of texts to analyze.

n_topics

Number of topics (default: 10).

n_bootstrap

Number of bootstrap iterations (default: 5).

sample_proportion

Proportion of documents to sample (default: 0.8).

embedding_model

Embedding model name (default: "all-MiniLM-L6-v2").

seed

Random seed for reproducibility.

verbose

Logical, if TRUE, prints progress messages.

Value

A list containing stability metrics:

  • topic_stability: Per-topic stability scores (0-1)

  • mean_stability: Overall stability score

  • keyword_stability: Stability of top keywords per topic

  • alignment_stability: Stability of STM-embedding alignment

  • bootstrap_results: Detailed results from each bootstrap run

Examples

if (FALSE) { # \dontrun{
  stability <- assess_hybrid_stability(
    texts = my_texts,
    n_topics = 10,
    n_bootstrap = 5,
    verbose = TRUE
  )

  # View topic stability scores
  stability$topic_stability
} # }