Evaluate Optimal Number of Topics — evaluate_optimal_topic

This function performs a search for the optimal number of topics (K) using stm::searchK and visualizes diagnostics, including held-out likelihood, residuals, semantic coherence, and lower bound metrics.

Usage

evaluate_optimal_topic_number(
  dfm_object,
  topic_range,
  max.em.its = 75,
  categorical_var = NULL,
  continuous_var = NULL,
  height = 600,
  width = 800,
  verbose = TRUE,
  ...
)

Arguments

dfm_object: A quanteda document-feature matrix (dfm).
topic_range: A numeric vector specifying the range of topics (K) to search over.
max.em.its: Maximum number of EM iterations (default: 75).
categorical_var: An optional character string for a categorical variable in the metadata.
continuous_var: An optional character string for a continuous variable in the metadata.
height: The height of the resulting Plotly plot in pixels (default: 600).
width: The width of the resulting Plotly plot in pixels (default: 800).
verbose: Logical; if TRUE, prints progress information.
...: Further arguments passed to stm::searchK.

Value

A plotly object showing the diagnostics for the number of topics (K).

Examples

if (interactive()) {
  df <- TextAnalysisR::SpecialEduTech

  united_tbl <- TextAnalysisR::unite_text_cols(df, listed_vars = c("title", "keyword", "abstract"))

  tokens <- TextAnalysisR::preprocess_texts(united_tbl, text_field = "united_texts")

  dfm_object <- quanteda::dfm(tokens)

  optimal_topic_range <- TextAnalysisR::evaluate_optimal_topic_number(
                           dfm_object = dfm_object,
                           topic_range = 5:30,
                           max.em.its = 75,
                           categorical_var = "reference_type",
                           continuous_var = "year",
                           height = 600,
                           width = 800,
                           verbose = TRUE)
  print(optimal_topic_range)
}