Auto-tune BERTopic Hyperparameters

Automatically searches for optimal hyperparameters for embedding-based topic modeling. Evaluates multiple configurations of UMAP and HDBSCAN parameters and returns the best model based on the specified metric. Embeddings are generated once and reused across all configurations for efficiency.

Usage

auto_tune_embedding_topics(
  texts,
  embeddings = NULL,
  embedding_model = "all-MiniLM-L6-v2",
  n_trials = 12,
  metric = "silhouette",
  seed = 123,
  verbose = TRUE
)

Arguments

texts: Character vector of documents to analyze.
embeddings: Precomputed embeddings matrix (optional). If NULL, embeddings are generated.
embedding_model: Embedding model name (default: "all-MiniLM-L6-v2").
n_trials: Maximum number of configurations to try (default: 12).
metric: Optimization metric: "silhouette", "coherence", or "combined" (default: "silhouette").
seed: Random seed for reproducibility.
verbose: Logical, if TRUE, prints progress messages.

Value

A list containing:

best_config: Data frame with the optimal hyperparameter configuration
best_model: The topic model fitted with optimal parameters
all_results: List of all evaluated configurations with metrics
n_trials_completed: Number of configurations successfully evaluated

Details

The function searches over these parameters:

n_neighbors: UMAP neighborhood size (5, 10, 15, 25)
min_cluster_size: HDBSCAN minimum cluster size (3, 5, 10)
cluster_selection_method: "eom" (broader) or "leaf" (finer-grained)

Examples

if (FALSE) { # \dontrun{
  texts <- c("Machine learning for image recognition",
             "Deep learning neural networks",
             "Natural language processing models",
             "Computer vision applications")

  tuning_result <- auto_tune_embedding_topics(
    texts = texts,
    n_trials = 6,
    metric = "silhouette",
    verbose = TRUE
  )

  # View best configuration
  tuning_result$best_config

  # Use the best model
  best_model <- tuning_result$best_model
} # }

Usage

Arguments

Value

Details

See also

Examples