Plot Mean Topic Prevalence Across Documents
Source:R/text_mining_functions.R
plot_mean_topic_prevalence.Rd
This function calculates the mean topic prevalence across documents and plots the top topics.
Usage
plot_mean_topic_prevalence(
dfm_object,
topic_n,
max.em.its = 75,
categorical_var = NULL,
continuous_var = NULL,
top_term_n = 10,
top_topic_n = 15,
topic_names = NULL,
height = 500,
width = 1000,
verbose = TRUE,
...
)
Arguments
- dfm_object
A quanteda document-feature matrix (dfm).
- topic_n
The number of topics to display.
- max.em.its
Maximum number of EM iterations (default: 75).
- categorical_var
An optional character string for a categorical variable in the metadata.
- continuous_var
An optional character string for a continuous variable in the metadata.
- top_term_n
The number of top terms to display for each topic (default: 10).
- top_topic_n
The number of top topics to display (default: 15).
- topic_names
An optional character vector for labeling topics. If provided, must be the same length as the number of topics.
- height
The height of the resulting Plotly plot, in pixels. Defaults to
500
.- width
The width of the resulting Plotly plot, in pixels. Defaults to
1000
.- verbose
Logical; if
TRUE
, prints progress information (default: FALSE).- ...
Further arguments passed to
stm::searchK
.
Value
A ggplot
object showing a bar plot of topic prevalence. Topics are ordered by their
mean gamma value (average prevalence across documents).
Examples
if (interactive()) {
df <- TextAnalysisR::SpecialEduTech
united_tbl <- TextAnalysisR::unite_text_cols(df, listed_vars = c("title", "keyword", "abstract"))
tokens <- TextAnalysisR::preprocess_texts(united_tbl, text_field = "united_texts")
dfm_object <- quanteda::dfm(tokens)
TextAnalysisR::plot_mean_topic_prevalence(
dfm_object = dfm_object,
topic_n = 15,
max.em.its = 75,
categorical_var = "reference_type",
continuous_var = "year",
top_term_n = 10,
top_topic_n = 15,
height = 500,
width = 1000,
verbose = TRUE)
}