Skip to contents

Quick reference guide organized by workflow stage.

Quick Start Examples

Complete Workflow (5 steps)

library(TextAnalysisR)

# 1. Load data
data(SpecialEduTech)
texts <- SpecialEduTech$abstract

# 2. Preprocess
tokens <- prep_texts(texts, remove_punct = TRUE, remove_numbers = TRUE)
dfm <- quanteda::dfm(tokens)

# 3. Analyze keywords
keywords <- extract_keywords_tfidf(dfm, top_n = 20)
plot_tfidf_keywords(keywords)

# 4. Topic modeling
model <- fit_embedding_model(texts, n_topics = 5)
get_topic_terms(model, n_terms = 10)

# 5. Sentiment analysis
sentiment <- sentiment_lexicon_analysis(texts, lexicon = "bing")
plot_sentiment_distribution(sentiment)

Generate Embeddings

# Auto-detect best available provider
embeddings <- get_best_embeddings(texts)

# Reduce dimensions for visualization
reduced <- reduce_dimensions(embeddings, method = "umap", n_components = 2)
plot_semantic_viz(reduced)

Network Analysis

# Co-occurrence network
word_co_occurrence_network(dfm, top_node_n = 30, co_occur_n = 5)

# Correlation network
word_correlation_network(dfm, top_node_n = 30, corr_n = 0.3)

1. Data Import & Preprocessing

Function Purpose
import_files() Import CSV, XLSX, PDF, DOCX, TXT files
unite_cols() Combine multiple text columns into one
prep_texts() Tokenize with full preprocessing options
detect_multi_words() Find collocations (n-grams)
get_available_dfm() Get best available DFM with fallback

2. Lexical Analysis

Function Purpose
calculate_word_frequency() Count word frequencies
extract_keywords_tfidf() TF-IDF keyword extraction
extract_keywords_keyness() Keyness-based keywords
lexical_diversity_analysis() TTR, MATTR, MTLD metrics
calculate_text_readability() Flesch, SMOG, ARI scores
Visualization Functions
Function Purpose
plot_word_frequency() Bar chart of word frequencies
plot_tfidf_keywords() TF-IDF keyword visualization
plot_keyness_keywords() Keyness comparison plot
plot_ngram_frequency() N-gram frequency plot
plot_readability_distribution() Readability score distribution
plot_lexical_diversity_distribution() Diversity metrics plot

3. Sentiment Analysis

Function Purpose
analyze_sentiment() Quick sentiment scoring
sentiment_lexicon_analysis() Dictionary-based (no Python)
sentiment_embedding_analysis() Neural sentiment (Python)
analyze_sentiment_llm() LLM-based with explanations (Ollama/OpenAI/Gemini)
Visualization Functions
Function Purpose
plot_sentiment_distribution() Sentiment score histogram
plot_sentiment_by_category() Sentiment by group
plot_sentiment_boxplot() Box plot comparison
plot_emotion_radar() Emotion radar chart

4. Semantic Analysis

Function Purpose
get_best_embeddings() Auto-detect and use best embedding provider
generate_embeddings() Create document embeddings (local)
reduce_dimensions() PCA, t-SNE, UMAP reduction
calculate_document_similarity() Compute similarity matrix
semantic_similarity_analysis() Full similarity workflow
semantic_document_clustering() Cluster similar documents
generate_cluster_labels() AI-generated cluster names
Visualization Functions
Function Purpose
plot_semantic_viz() 2D/3D semantic visualization
plot_similarity_heatmap() Similarity matrix heatmap
plot_cross_category_heatmap() Cross-category similarity comparison
plot_cluster_terms() Cluster term visualization

5. Network Analysis

Function Purpose
word_co_occurrence_network() Word co-occurrence graph
word_correlation_network() Word correlation graph
Network Parameters
Parameter Default Description
node_label_size 22 Font size for node labels (12-40)
community_method “leiden” Algorithm: “leiden”, “louvain”
top_node_n 30 Number of top nodes to display
co_occur_n 10 Minimum co-occurrence count (co-occurrence only)
corr_n 0.4 Minimum correlation threshold (correlation only)
Network Statistics (9 Metrics)
Metric Description
Nodes Total unique words
Edges Total connections
Density Edge density (0-1)
Diameter Longest shortest path
Global Clustering Network clustering tendency
Avg Local Clustering Average local clustering
Modularity Community structure quality
Assortativity Similar node connection tendency
Avg Path Length Average node distance

6. Topic Modeling

Function Purpose
find_optimal_k() Search for optimal topic count
fit_semantic_model() STM (Structural Topic Model)
fit_embedding_model() Embedding-based topics (BERTopic)
fit_hybrid_model() STM + embeddings hybrid
get_topic_terms() Extract top words per topic
get_topic_prevalence() Calculate topic prevalence
generate_topic_labels() AI-generated topic names
Visualization Functions
Function Purpose
plot_topic_probability() Topic probability distribution
plot_topic_effects_categorical() Topic effects by category
plot_topic_effects_continuous() Topic effects over continuous var
plot_word_probability() Word probability per topic
plot_quality_metrics() Model quality metrics

7. PDF Processing

Function Purpose
process_pdf_unified() Auto-fallback PDF extraction
extract_text_from_pdf() Extract text (R)
extract_pdf_multimodal() Vision AI for images in PDFs
detect_pdf_content_type() Detect PDF content type

8. AI Integration

TextAnalysisR uses a human-in-the-loop approach where AI provides suggestions that you review, edit, and approve before use. Content generation is topic-grounded: drafts are based on validated topic terms and beta scores, not parametric AI knowledge.

Supports local (Ollama) and web-based (OpenAI, Gemini) providers.

Function Purpose
call_llm_api() Unified LLM API (all providers)
call_ollama() Local Ollama API
call_gemini_chat() Gemini API
generate_topic_labels() AI-suggested topic labels
generate_topic_content() Topic-grounded content drafts
generate_cluster_labels() AI-suggested cluster names
analyze_sentiment_llm() LLM-based sentiment analysis
run_rag_search() RAG search over documents
get_api_embeddings() Web-based embeddings (OpenAI, Gemini)
get_spacy_embeddings() Local spaCy word embeddings
Ollama Utilities
Function Purpose
check_ollama() Verify Ollama availability
list_ollama_models() List installed models
get_recommended_ollama_model() Auto-select best model

9. Linguistic Analysis

Function Purpose
extract_pos_tags() Identify word types (nouns, verbs, adjectives)
extract_named_entities() Find people, places, organizations in text
extract_morphology() Analyze verb tenses, plural forms

Requires Python. Run setup_python_env() first.

10. Python Environment

Function Purpose
setup_python_env() Set up Python environment
check_python_env() Check Python configuration

11. Validation & Quality

Validation Functions
Function Purpose
cross_analysis_validation() Cross-validate analysis
validate_semantic_coherence() Check semantic coherence
calculate_clustering_metrics() Clustering quality metrics

Launch App

The Shiny app provides an interactive interface for all functions: