Skip to contents

Concatenates top terms for each topic into text strings suitable for embedding generation. Useful for creating topic representations for semantic similarity analysis.

Usage

get_topic_texts(
  top_terms_df,
  topic_var = "topic",
  term_var = "term",
  weight_var = NULL,
  sep = " ",
  top_n = NULL
)

Arguments

top_terms_df

A data frame containing top terms for topics, typically output from get_topic_terms.

topic_var

Name of the column containing topic identifiers (default: "topic").

term_var

Name of the column containing terms (default: "term").

weight_var

Optional name of column with term weights (e.g., "beta"). If provided, terms are ordered by weight before concatenation.

sep

Separator between terms (default: " ").

top_n

Optional number of top terms to include per topic (default: NULL, uses all).

Value

A character vector of topic text strings, one per topic, ordered by topic number.

Examples

if (FALSE) { # \dontrun{
# Get topic terms from STM model
top_terms <- TextAnalysisR::get_topic_terms(stm_model, top_term_n = 10)

# Convert to text strings for embedding
topic_texts <- get_topic_texts(top_terms)

# Generate embeddings
topic_embeddings <- TextAnalysisR::generate_embeddings(topic_texts)
} # }