Skip to contents

This function provides a visualization of the top terms for each topic, ordered by their word probability distribution for each topic (beta).

Usage

plot_word_probabilities(
  dfm_object,
  topic_n,
  max.em.its = 75,
  categorical_var = NULL,
  continuous_var = NULL,
  top_term_n = 10,
  ncol = 3,
  topic_names = NULL,
  height = 1200,
  width = 800,
  verbose = TRUE,
  ...
)

Arguments

dfm_object

A quanteda document-feature matrix (dfm).

topic_n

The number of topics to display.

max.em.its

Maximum number of EM iterations (default: 75).

categorical_var

An optional character string for a categorical variable in the metadata.

continuous_var

An optional character string for a continuous variable in the metadata.

top_term_n

The number of top terms to display for each topic (default: 10).

ncol

The number of columns in the facet plot (default: 3).

topic_names

An optional character vector for labeling topics. If provided, must be the same length as the number of topics.

height

The height of the resulting Plotly plot, in pixels. Defaults to 1200.

width

The width of the resulting Plotly plot, in pixels. Defaults to 800.

verbose

Logical; if TRUE, prints progress information.

...

Further arguments passed to stm::searchK.

Value

A Plotly object showing a facet-wrapped chart of top terms for each topic, ordered by their per-topic probability (beta). Each facet represents a topic.

Details

If topic_names is provided, it replaces the default "Topic {n}" labels with custom names.

Examples

if (interactive()) {
  df <- TextAnalysisR::SpecialEduTech
  united_tbl <- TextAnalysisR::unite_text_cols(df, listed_vars = c("title", "keyword", "abstract"))
  tokens <- TextAnalysisR::preprocess_texts(united_tbl, text_field = "united_texts")
  dfm_object <- quanteda::dfm(tokens)
TextAnalysisR::plot_word_probabilities(
  dfm_object = dfm_object,
  topic_n = 15,
  max.em.its = 75,
  categorical_var = "reference_type",
  continuous_var = "year",
  top_term_n = 10,
  ncol = 3,
  height = 1200,
  width = 800,
  verbose = TRUE)
}