This function removes specified common words from a tokens object and applies two dictionaries to categorize the remaining tokens. It returns a document-feature matrix (dfm) based on the processed tokens. If no words are specified for removal, it returns an initial dfm using the provided initialization function.
Arguments
- tokens
A
tokens
object from thequanteda
package, typically processed using functions liketokens_select
ortokens_remove
.- remove_vars
A character vector of words to remove from the tokens. If
NULL
, the function returns the result ofdfm_init_func()
.- dfm_object
A
dfm
object to process after removing the specified words.
Examples
if (interactive()) {
df <- TextAnalysisR::SpecialEduTech
united_tbl <- TextAnalysisR::unite_text_cols(df, listed_vars = c("title", "keyword", "abstract"))
tokens <- TextAnalysisR::preprocess_texts(united_tbl, text_field = "united_texts")
dfm_object <- quanteda::dfm(tokens)
TextAnalysisR::remove_common_words(tokens = tokens,
remove_vars = c("level", "testing"),
dfm_object = dfm_object)
}