
Calculate Log Odds Ratio Between Categories
Source:R/lexical_analysis.R
calculate_log_odds_ratio.RdComputes log odds ratio to compare word frequencies between categories. Identifies words that are distinctively used in one category vs another. Uses Laplace smoothing to handle zero counts.
Usage
calculate_log_odds_ratio(
dfm_object,
group_var,
comparison_mode = c("binary", "one_vs_rest", "pairwise"),
reference_level = NULL,
top_n = 10,
min_count = 5
)Arguments
- dfm_object
A quanteda dfm object
- group_var
Character, name of the grouping variable in docvars
- comparison_mode
Character, one of "binary", "one_vs_rest", or "pairwise"
binary: Compare two categories directly
one_vs_rest: Compare each category against all others combined
pairwise: Compare all pairs of categories
- reference_level
Character, reference category for binary comparison (default: first level)
- top_n
Number of top terms per comparison (default: 10)
- min_count
Minimum word count to include (default: 5)
Value
Data frame with columns:
term: The word/feature
category1: First category in comparison
category2: Second category in comparison
count1: Count in category 1
count2: Count in category 2
odds1: Odds in category 1
odds2: Odds in category 2
odds_ratio: Ratio of odds
log_odds_ratio: Log of odds ratio (positive = more in compared category)
See also
Other lexical:
calculate_dispersion_metrics(),
calculate_lexical_dispersion(),
calculate_text_readability(),
clear_lexdiv_cache(),
detect_multi_words(),
extract_keywords_keyness(),
extract_keywords_tfidf(),
extract_morphology(),
extract_named_entities(),
extract_noun_chunks(),
extract_pos_tags(),
extract_subjects_objects(),
find_similar_words(),
get_sentences(),
get_spacy_embeddings(),
get_spacy_model_info(),
get_word_similarity(),
init_spacy_nlp(),
lexical_analysis,
lexical_diversity_analysis(),
lexical_frequency_analysis(),
parse_morphology_string(),
plot_keyness_keywords(),
plot_keyword_comparison(),
plot_lexical_diversity_distribution(),
plot_morphology_feature(),
plot_readability_by_group(),
plot_readability_distribution(),
plot_tfidf_keywords(),
plot_top_readability_documents(),
render_displacy_dep(),
render_displacy_ent(),
spacy_extract_entities(),
spacy_has_vectors(),
spacy_initialized(),
spacy_parse_full(),
summarize_morphology()