Parse texts using spaCy and return token-level annotations. This is the main parsing function for NLP analysis. Works with character vectors or quanteda tokens objects.
Usage
spacy_parse_full(
x,
pos = TRUE,
tag = TRUE,
lemma = TRUE,
entity = FALSE,
dependency = FALSE,
morph = FALSE,
model = "en_core_web_sm"
)Arguments
- x
Character vector of texts OR a quanteda tokens object.
- pos
Logical; include coarse POS tags (default: TRUE).
- tag
Logical; include fine-grained tags (default: TRUE).
- lemma
Logical; include lemmatized forms (default: TRUE).
- entity
Logical; include named entity tags (default: FALSE).
- dependency
Logical; include dependency relations (default: FALSE).
- morph
Logical; include morphological features (default: FALSE).
- model
Character; spaCy model to use (default: "en_core_web_sm").
Value
A data frame with token-level annotations including:
doc_id: Document identifiersentence_id: Sentence number within documenttoken_id: Token position within sentencetoken: Original token textpos: Coarse POS tag (if pos = TRUE)tag: Fine-grained tag (if tag = TRUE)lemma: Lemmatized form (if lemma = TRUE)entity: Named entity tag (if entity = TRUE)head_token_id: Head token ID (if dependency = TRUE)dep_rel: Dependency relation (if dependency = TRUE)morph: Morphological features string (if morph = TRUE)
See also
Other lexical:
calculate_dispersion_metrics(),
calculate_lexical_dispersion(),
calculate_log_odds_ratio(),
calculate_text_readability(),
clear_lexdiv_cache(),
detect_multi_words(),
extract_keywords_keyness(),
extract_keywords_tfidf(),
extract_morphology(),
extract_named_entities(),
extract_noun_chunks(),
extract_pos_tags(),
extract_subjects_objects(),
find_similar_words(),
get_sentences(),
get_spacy_embeddings(),
get_spacy_model_info(),
get_word_similarity(),
init_spacy_nlp(),
lexical_analysis,
lexical_diversity_analysis(),
lexical_frequency_analysis(),
parse_morphology_string(),
plot_keyness_keywords(),
plot_keyword_comparison(),
plot_lexical_diversity_distribution(),
plot_morphology_feature(),
plot_readability_by_group(),
plot_readability_distribution(),
plot_tfidf_keywords(),
plot_top_readability_documents(),
render_displacy_dep(),
render_displacy_ent(),
spacy_extract_entities(),
spacy_has_vectors(),
spacy_initialized(),
summarize_morphology()
Examples
if (FALSE) { # \dontrun{
# From SpecialEduTech dataset
texts <- TextAnalysisR::SpecialEduTech$abstract[1:5]
parsed <- spacy_parse_full(texts, morph = TRUE)
# From quanteda tokens
united <- unite_cols(TextAnalysisR::SpecialEduTech, c("title", "abstract"))
tokens <- prep_texts(united, text_field = "united_texts")
parsed <- spacy_parse_full(tokens, morph = TRUE)
} # }
