Evaluate Optimal Number of Topics
Source:R/text_mining_functions.R
evaluate_optimal_topic_number.Rd
This function performs a search for the optimal number of topics (K) using stm::searchK
and visualizes diagnostics, including held-out likelihood, residuals, semantic coherence,
and lower bound metrics.
Usage
evaluate_optimal_topic_number(
dfm_object,
topic_range,
max.em.its = 75,
categorical_var = NULL,
continuous_var = NULL,
height = 600,
width = 800,
verbose = TRUE,
...
)
Arguments
- dfm_object
A
quanteda
document-feature matrix (dfm).- topic_range
A numeric vector specifying the range of topics (K) to search over.
- max.em.its
Maximum number of EM iterations (default: 75).
- categorical_var
An optional character string for a categorical variable in the metadata.
- continuous_var
An optional character string for a continuous variable in the metadata.
- height
The height of the resulting Plotly plot in pixels (default: 600).
- width
The width of the resulting Plotly plot in pixels (default: 800).
- verbose
Logical; if
TRUE
, prints progress information.- ...
Further arguments passed to
stm::searchK
.
Examples
if (interactive()) {
df <- TextAnalysisR::SpecialEduTech
united_tbl <- TextAnalysisR::unite_text_cols(df, listed_vars = c("title", "keyword", "abstract"))
tokens <- TextAnalysisR::preprocess_texts(united_tbl, text_field = "united_texts")
dfm_object <- quanteda::dfm(tokens)
TextAnalysisR::evaluate_optimal_topic_number(
dfm_object = dfm_object,
topic_range = 5:30,
max.em.its = 75,
categorical_var = "reference_type",
continuous_var = "year",
height = 600,
width = 800,
verbose = TRUE
)
}