Skip to contents

This function calculates the mean topic prevalence across documents and plots the top topics.

Usage

plot_mean_topic_prevalence(
  dfm_object,
  topic_n,
  max.em.its = 75,
  categorical_var = NULL,
  continuous_var = NULL,
  top_term_n = 10,
  top_topic_n = 15,
  topic_names = NULL,
  height = 500,
  width = 1000,
  verbose = TRUE,
  ...
)

Arguments

dfm_object

A quanteda document-feature matrix (dfm).

topic_n

The number of topics to display.

max.em.its

Maximum number of EM iterations (default: 75).

categorical_var

An optional character string for a categorical variable in the metadata.

continuous_var

An optional character string for a continuous variable in the metadata.

top_term_n

The number of top terms to display for each topic (default: 10).

top_topic_n

The number of top topics to display (default: 15).

topic_names

An optional character vector for labeling topics. If provided, must be the same length as the number of topics.

height

The height of the resulting Plotly plot, in pixels. Defaults to 500.

width

The width of the resulting Plotly plot, in pixels. Defaults to 1000.

verbose

Logical; if TRUE, prints progress information (default: FALSE).

...

Further arguments passed to stm::searchK.

Value

A ggplot object showing a bar plot of topic prevalence. Topics are ordered by their mean gamma value (average prevalence across documents).

Details

If topic_names is provided, it replaces the default "Topic {n}" labels with custom names.#'

Examples

if (interactive()) {
  df <- TextAnalysisR::SpecialEduTech
  united_tbl <- TextAnalysisR::unite_text_cols(df, listed_vars = c("title", "keyword", "abstract"))
  tokens <- TextAnalysisR::preprocess_texts(united_tbl, text_field = "united_texts")
  dfm_object <- quanteda::dfm(tokens)
TextAnalysisR::plot_mean_topic_prevalence(
  dfm_object = dfm_object,
  topic_n = 15,
  max.em.its = 75,
  categorical_var = "reference_type",
  continuous_var = "year",
  top_term_n = 10,
  top_topic_n = 15,
  height = 500,
  width = 1000,
  verbose = TRUE)
}