Extract Morphological Features — extract_morphology • TextAnalysisR

Uses spaCy to extract comprehensive morphological features from text. Returns data with Number, Tense, VerbForm, Person, Case, Mood, Aspect, etc.

Usage

extract_morphology(
  tokens,
  features = c("Number", "Tense", "VerbForm", "Person", "Case", "Mood", "Aspect"),
  include_pos = TRUE,
  include_lemma = TRUE,
  model = "en_core_web_sm"
)

Arguments

tokens: A quanteda tokens object or character vector of texts.
features: Character vector of morphological features to extract. Default includes common Universal Dependencies features.
include_pos: Logical; include POS tags (default: TRUE).
include_lemma: Logical; include lemmatized forms (default: TRUE).
model: Character; spaCy model to use (default: "en_core_web_sm").

Value

A data frame with token-level morphological annotations including morph_* columns for each requested feature.

Details

Morphological features follow Universal Dependencies annotation. Common features include:

Number: Sing (singular), Plur (plural)
Tense: Past, Pres (present), Fut (future)
VerbForm: Fin (finite), Inf (infinitive), Part (participle), Ger (gerund)
Person: 1, 2, 3 (first, second, third person)
Case: Nom (nominative), Acc (accusative), Gen (genitive), Dat (dative)
Mood: Ind (indicative), Imp (imperative), Sub (subjunctive)
Aspect: Perf (perfective), Imp (imperfective), Prog (progressive)

See also

Other lexical: calculate_dispersion_metrics(), calculate_lexical_dispersion(), calculate_log_odds_ratio(), calculate_text_readability(), clear_lexdiv_cache(), detect_multi_words(), extract_keywords_keyness(), extract_keywords_tfidf(), extract_named_entities(), extract_noun_chunks(), extract_pos_tags(), extract_subjects_objects(), find_similar_words(), get_sentences(), get_spacy_embeddings(), get_spacy_model_info(), get_word_similarity(), init_spacy_nlp(), lexical_analysis, lexical_diversity_analysis(), lexical_frequency_analysis(), parse_morphology_string(), plot_keyness_keywords(), plot_keyword_comparison(), plot_lexical_diversity_distribution(), plot_morphology_feature(), plot_readability_by_group(), plot_readability_distribution(), plot_tfidf_keywords(), plot_top_readability_documents(), render_displacy_dep(), render_displacy_ent(), spacy_extract_entities(), spacy_has_vectors(), spacy_initialized(), spacy_lemmatize(), spacy_parse_full(), summarize_morphology()

Examples

if (FALSE) { # \dontrun{
tokens <- quanteda::tokens("The cats are running quickly.")
morph_data <- extract_morphology(tokens)
print(morph_data)
} # }