library(TextAnalysisR)
tokens <- quanteda::tokens(SpecialEduTech$abstract[1:5])
dispersion <- calculate_lexical_dispersion(
tokens,
terms = c("learning", "instruction")
)
head(dispersion)## doc_id term position doc_length
## 1 text2 learning 0.5200000 25
## 2 text3 learning 0.3714286 35
## 3 text3 instruction 0.3142857 35
## 4 text3 instruction 0.6857143 35
## 5 text5 learning 0.7904762 105
Python enables features: NLP with spaCy, embeddings, and neural sentiment analysis.
Quick Setup
setup_python_env() automatically:
- Creates virtual environment
textanalysisr-env - Installs spacy and pdfplumber
- Downloads spaCy English model (
en_core_web_sm)
Uses virtualenv (or conda if available).
Check Status
Run check_python_env() to verify the environment.
Common Issues
spaCy Models
The default en_core_web_sm model is installed
automatically. For word vectors (similarity), the medium model is 91 MB
and the large model is 560 MB:
Diagnostics
Use reticulate::py_config() and
reticulate::virtualenv_list() to inspect the active
Python.
