Calculate Log Odds Ratio Between Categories

Computes log odds ratio to compare word frequencies between categories. Identifies words that are distinctively used in one category vs another. Uses Laplace smoothing to handle zero counts.

Usage

calculate_log_odds_ratio(
  dfm_object,
  group_var,
  comparison_mode = c("binary", "one_vs_rest", "pairwise"),
  reference_level = NULL,
  top_n = 10,
  min_count = 5
)

Arguments

dfm_object

A quanteda dfm object

group_var

Character, name of the grouping variable in docvars

comparison_mode

Character, one of "binary", "one_vs_rest", or "pairwise"

binary: Compare two categories directly
one_vs_rest: Compare each category against all others combined
pairwise: Compare all pairs of categories

reference_level

Character, reference category for binary comparison (default: first level)

top_n

Number of top terms per comparison (default: 10)

min_count

Minimum word count to include (default: 5)

Value

Data frame with columns:

term: The word/feature
category1: First category in comparison
category2: Second category in comparison
count1: Count in category 1
count2: Count in category 2
odds1: Odds in category 1
odds2: Odds in category 2
odds_ratio: Ratio of odds
log_odds_ratio: Log of odds ratio (positive = more in compared category)

Other lexical: calculate_dispersion_metrics(), calculate_lexical_dispersion(), calculate_text_readability(), clear_lexdiv_cache(), detect_multi_words(), extract_keywords_keyness(), extract_keywords_tfidf(), extract_morphology(), extract_named_entities(), extract_noun_chunks(), extract_pos_tags(), extract_subjects_objects(), find_similar_words(), get_sentences(), get_spacy_embeddings(), get_spacy_model_info(), get_word_similarity(), init_spacy_nlp(), lexical_analysis, lexical_diversity_analysis(), lexical_frequency_analysis(), parse_morphology_string(), plot_keyness_keywords(), plot_keyword_comparison(), plot_lexical_diversity_distribution(), plot_morphology_feature(), plot_readability_by_group(), plot_readability_distribution(), plot_tfidf_keywords(), plot_top_readability_documents(), render_displacy_dep(), render_displacy_ent(), spacy_extract_entities(), spacy_has_vectors(), spacy_initialized(), spacy_lemmatize(), spacy_parse_full(), summarize_morphology()

Examples

if (FALSE) { # \dontrun{
library(quanteda)
corp <- corpus(c("The cat runs fast", "Dogs are loyal pets",
                 "Cats sleep all day", "My dog loves walks"),
               docvars = data.frame(animal = c("cat", "dog", "cat", "dog")))
dfm <- tokens(corp) %>% dfm()
log_odds <- calculate_log_odds_ratio(dfm, "animal")
} # }

Usage

Arguments

Value

See also

Examples