Uses spaCy to extract named entities (NER) from tokenized text. Returns a data frame with token-level entity annotations.
Usage
extract_named_entities(
tokens,
include_pos = TRUE,
include_lemma = TRUE,
model = "en_core_web_sm"
)Value
A data frame with columns:
doc_id: Document identifiertoken: Original tokenentity: Named entity type (e.g., PERSON, ORG, GPE)pos: Universal POS tag (if include_pos = TRUE)lemma: Lemmatized form (if include_lemma = TRUE)
Details
This function requires the spacyr package and a working Python environment with spaCy installed. If spaCy is not initialized, this function will attempt to initialize it with the specified model.
See also
Other lexical:
calculate_text_readability(),
clear_lexdiv_cache(),
detect_multi_words(),
extract_keywords_keyness(),
extract_keywords_tfidf(),
extract_morphology(),
extract_pos_tags(),
lexical_analysis,
lexical_diversity_analysis(),
lexical_frequency_analysis(),
plot_keyness_keywords(),
plot_keyword_comparison(),
plot_lexical_diversity_distribution(),
plot_morphology_feature(),
plot_readability_by_group(),
plot_readability_distribution(),
plot_tfidf_keywords(),
plot_top_readability_documents(),
render_displacy_dep(),
render_displacy_ent(),
summarize_morphology()
