Extract Part-of-Speech Tags from Tokens

Uses spaCy to extract part-of-speech (POS) tags from tokenized text. Returns a data frame with token-level POS annotations.

Usage

extract_pos_tags(
  tokens,
  include_lemma = TRUE,
  include_entity = FALSE,
  include_dependency = FALSE,
  model = "en_core_web_sm"
)

Arguments

tokens: A quanteda tokens object or character vector of texts.
include_lemma: Logical; include lemmatized forms (default: TRUE).
include_entity: Logical; include named entity recognition (default: FALSE).
include_dependency: Logical; include dependency parsing (default: FALSE).
model: Character; spaCy model to use (default: "en_core_web_sm").

Value

A data frame with columns:

doc_id: Document identifier
sentence_id: Sentence number within document
token_id: Token position within sentence
token: Original token
pos: Universal POS tag (e.g., NOUN, VERB, ADJ)
tag: Detailed POS tag (e.g., NN, VBD, JJ)
lemma: Lemmatized form (if include_lemma = TRUE)
entity: Named entity type (if include_entity = TRUE)
head_token_id: Head token in dependency tree (if include_dependency = TRUE)
dep_rel: Dependency relation type, e.g., nsubj, dobj (if include_dependency = TRUE)

Details

This function requires the Python with spaCy installed. If spaCy is not initialized, this function will attempt to initialize it with the specified model.

Other lexical: calculate_dispersion_metrics(), calculate_lexical_dispersion(), calculate_log_odds_ratio(), calculate_text_readability(), clear_lexdiv_cache(), detect_multi_words(), extract_keywords_keyness(), extract_keywords_tfidf(), extract_morphology(), extract_named_entities(), extract_noun_chunks(), extract_subjects_objects(), find_similar_words(), get_sentences(), get_spacy_embeddings(), get_spacy_model_info(), get_word_similarity(), init_spacy_nlp(), lexical_analysis, lexical_diversity_analysis(), lexical_frequency_analysis(), parse_morphology_string(), plot_keyness_keywords(), plot_keyword_comparison(), plot_lexical_diversity_distribution(), plot_morphology_feature(), plot_readability_by_group(), plot_readability_distribution(), plot_tfidf_keywords(), plot_top_readability_documents(), render_displacy_dep(), render_displacy_ent(), spacy_extract_entities(), spacy_has_vectors(), spacy_initialized(), spacy_lemmatize(), spacy_parse_full(), summarize_morphology()

Examples

if (FALSE) { # \dontrun{
tokens <- quanteda::tokens("The quick brown fox jumps over the lazy dog.")
pos_data <- extract_pos_tags(tokens)
print(pos_data)
} # }

Usage

Arguments

Value

Details

See also

Examples