Uses spaCy to extract comprehensive morphological features from text. Returns data with Number, Tense, VerbForm, Person, Case, Mood, Aspect, etc.
Usage
extract_morphology(
tokens,
features = c("Number", "Tense", "VerbForm", "Person", "Case", "Mood", "Aspect"),
include_pos = TRUE,
include_lemma = TRUE,
model = "en_core_web_sm"
)Arguments
- tokens
A quanteda tokens object or character vector of texts.
- features
Character vector of morphological features to extract. Default includes common Universal Dependencies features.
- include_pos
Logical; include POS tags (default: TRUE).
- include_lemma
Logical; include lemmatized forms (default: TRUE).
- model
Character; spaCy model to use (default: "en_core_web_sm").
Value
A data frame with token-level morphological annotations including morph_* columns for each requested feature.
Details
Morphological features follow Universal Dependencies annotation. Common features include:
Number: Sing (singular), Plur (plural)Tense: Past, Pres (present), Fut (future)VerbForm: Fin (finite), Inf (infinitive), Part (participle), Ger (gerund)Person: 1, 2, 3 (first, second, third person)Case: Nom (nominative), Acc (accusative), Gen (genitive), Dat (dative)Mood: Ind (indicative), Imp (imperative), Sub (subjunctive)Aspect: Perf (perfective), Imp (imperfective), Prog (progressive)
See also
Other lexical:
calculate_text_readability(),
clear_lexdiv_cache(),
detect_multi_words(),
extract_keywords_keyness(),
extract_keywords_tfidf(),
extract_named_entities(),
extract_pos_tags(),
lexical_analysis,
lexical_diversity_analysis(),
lexical_frequency_analysis(),
plot_keyness_keywords(),
plot_keyword_comparison(),
plot_lexical_diversity_distribution(),
plot_morphology_feature(),
plot_readability_by_group(),
plot_readability_distribution(),
plot_tfidf_keywords(),
plot_top_readability_documents(),
render_displacy_dep(),
render_displacy_ent(),
summarize_morphology()
