Concatenates top terms for each topic into text strings suitable for embedding generation. Useful for creating topic representations for semantic similarity analysis.
Usage
get_topic_texts(
top_terms_df,
topic_var = "topic",
term_var = "term",
weight_var = NULL,
sep = " ",
top_n = NULL
)Arguments
- top_terms_df
A data frame containing top terms for topics, typically output from
get_topic_terms.- topic_var
Name of the column containing topic identifiers (default: "topic").
- term_var
Name of the column containing terms (default: "term").
- weight_var
Optional name of column with term weights (e.g., "beta"). If provided, terms are ordered by weight before concatenation.
- sep
Separator between terms (default: " ").
- top_n
Optional number of top terms to include per topic (default: NULL, uses all).
Examples
# Topic-term frame as produced by get_topic_terms()
top_terms <- data.frame(
topic = c(1, 1, 1, 2, 2, 2),
term = c("calculator", "arithmetic", "elementary",
"computer", "instruction", "multiplication"),
prob = c(0.10, 0.08, 0.07, 0.12, 0.09, 0.06)
)
get_topic_texts(top_terms)
#> [1] "calculator arithmetic elementary" "computer instruction multiplication"
get_topic_texts(top_terms, weight_var = "prob", top_n = 2)
#> [1] "calculator arithmetic" "computer instruction"
