A Text Mining Workflow Tool • TextAnalysisR

Text mining and natural language processing workflow for documents (PDF, DOCX, XLSX, CSV, TXT). Includes preprocessing via quanteda, lexical analysis (term frequency-inverse document frequency, log-odds ratios, lexical diversity) via tidytext, topic modeling via stm and BERTopic, semantic similarity and document clustering on transformer embeddings, an interactive Shiny interface with ggplot2 visualization, optional spaCy lemmatization, and local sentence-transformers or web-based (OpenAI, Gemini) model providers for retrieval-augmented generation.

Installation

Release version from CRAN:

install.packages("TextAnalysisR")

Development version from R-universe:

install.packages("TextAnalysisR", repos = "https://mshin77.r-universe.dev")

Python Setup (Optional)

Core analyses run in plain R. Python is only needed for lemmatization, embeddings, clustering, PDF extraction, and transformer-based analyses. Run once after installing:

library(TextAnalysisR)
setup_python_env()

This sets up a dedicated virtualenv with the required Python packages. Restart R afterward; check status with check_python_env().

Load the TextAnalysisR Package

library(TextAnalysisR)

Alternatively, Launch and Browse the Shiny App

Access the web app at https://www.textanalysisr.org.

Launch and browse the app on the local computer:

run_app()

Getting Started

See Quick Start for tutorials.

Citation

Shin, M. (2026). TextAnalysisR: A text mining workflow tool (R package version 0.1.4) [Computer software]. https://mshin77.github.io/TextAnalysisR/
Shin, M. (2026). TextAnalysisR: A text mining workflow tool [Web application]. https://www.textanalysisr.org