
Text mining and natural language processing workflow for documents (PDF, DOCX, XLSX, CSV, TXT). Includes preprocessing via quanteda, lexical analysis (term frequency-inverse document frequency, log-odds ratios, lexical diversity) via tidytext, topic modeling via stm and BERTopic, semantic similarity and document clustering on transformer embeddings, an interactive Shiny interface with ggplot2 visualization, optional spaCy lemmatization, and local (Ollama, sentence-transformers) or web-based (OpenAI, Gemini) model providers for retrieval-augmented generation.
Installation
From R-universe (pre-built binaries for Windows, macOS, and Linux):
install.packages("TextAnalysisR",
repos = c("https://mshin77.r-universe.dev", "https://cloud.r-project.org"))Or the development version from GitHub:
First-Time Python Setup
Several functions (lemmatize_tokens(), generate_embeddings(), cluster_embeddings(), PDF extraction, transformer-based analyses) require Python packages. Run this once after installing TextAnalysisR:
This creates a dedicated virtualenv (textanalysisr-env), installs the packages listed in inst/python/requirements.txt (spaCy, pandas, pdfplumber, sentence-transformers, torch, umap-learn, hdbscan, scikit-learn, numba), and downloads the en_core_web_sm spaCy model. Restart R afterward. Check status anytime with check_python_env().
Alternatively, Launch and Browse the Shiny App
Access the web app at https://www.textanalysisr.org.
Launch and browse the app on the local computer:
Getting Started
See Quick Start for tutorials.
Citation
Shin, M. (2026). TextAnalysisR: A text mining workflow tool (R package version 0.1.4) [Computer software]. https://mshin77.github.io/TextAnalysisR/
Shin, M. (2026). TextAnalysisR: A text mining workflow tool [Web application]. https://www.textanalysisr.org