Function Reference Cheatsheet • TextAnalysisR

Quick reference guide organized by workflow stage.

Quick Start Examples

Complete Workflow (5 steps)

library(TextAnalysisR)

# 1. Load data
data(SpecialEduTech)
texts <- SpecialEduTech$abstract

# 2. Preprocess
tokens <- prep_texts(texts, remove_punct = TRUE, remove_numbers = TRUE)
dfm <- quanteda::dfm(tokens)

# 3. Analyze keywords
keywords <- extract_keywords_tfidf(dfm, top_n = 20)
plot_tfidf_keywords(keywords)

# 4. Topic modeling
model <- fit_embedding_model(texts, n_topics = 5)
get_topic_terms(model, top_term_n = 10)

# 5. Sentiment analysis
sentiment <- sentiment_lexicon_analysis(texts, lexicon = "bing")
plot_sentiment_distribution(sentiment)

Generate Embeddings

# Auto-detect best available provider
embeddings <- get_best_embeddings(texts)

# Reduce dimensions for visualization
reduced <- reduce_dimensions(embeddings, method = "umap", n_components = 2)
plot_semantic_viz(reduced)

Network Analysis

# Co-occurrence network
word_co_occurrence_network(dfm, top_node_n = 30, co_occur_n = 5)

# Correlation network
word_correlation_network(dfm, top_node_n = 30, corr_n = 0.3)

1. Data Import & Preprocessing

Function	Purpose
`import_files()`	Import CSV, XLSX, PDF, DOCX, TXT files
`unite_cols()`	Combine multiple text columns into one
`prep_texts()`	Tokenize with full preprocessing options
`detect_multi_words()`	Find collocations (n-grams)
`get_available_dfm()`	Get best available DFM with fallback

2. Lexical Analysis

Function	Purpose
`calculate_word_frequency()`	Count word frequencies
`extract_keywords_tfidf()`	TF-IDF keyword extraction
`extract_keywords_keyness()`	Keyness-based keywords
`lexical_diversity_analysis()`	TTR, MATTR, MTLD metrics
`calculate_text_readability()`	Flesch, SMOG, ARI scores

Visualization Functions

Function	Purpose
`plot_word_frequency()`	Bar chart of word frequencies
`plot_tfidf_keywords()`	TF-IDF keyword visualization
`plot_keyness_keywords()`	Keyness comparison plot
`plot_ngram_frequency()`	N-gram frequency plot
`plot_readability_distribution()`	Readability score distribution
`plot_lexical_diversity_distribution()`	Diversity metrics plot

3. Sentiment Analysis

Function	Purpose
`analyze_sentiment()`	Quick sentiment scoring
`sentiment_lexicon_analysis()`	Dictionary-based (no Python)
`sentiment_embedding_analysis()`	Neural sentiment (Python)
`analyze_sentiment_llm()`	LLM-based with explanations (Ollama/OpenAI/Gemini)

Visualization Functions

Function	Purpose
`plot_sentiment_distribution()`	Sentiment score histogram
`plot_sentiment_by_category()`	Sentiment by group
`plot_sentiment_boxplot()`	Box plot comparison
`plot_emotion_radar()`	Emotion radar chart

4. Semantic Analysis

Function	Purpose
`get_best_embeddings()`	Auto-detect and use best embedding provider
`generate_embeddings()`	Create document embeddings (local)
`reduce_dimensions()`	PCA, t-SNE, UMAP reduction
`calculate_document_similarity()`	Compute similarity matrix
`semantic_similarity_analysis()`	Full similarity workflow
`semantic_document_clustering()`	Cluster similar documents
`generate_cluster_labels()`	AI-generated cluster names

Visualization Functions

Function	Purpose
`plot_semantic_viz()`	2D/3D semantic visualization
`plot_similarity_heatmap()`	Similarity matrix heatmap
`plot_cross_category_heatmap()`	Cross-category similarity comparison
`plot_cluster_terms()`	Cluster term visualization

5. Network Analysis

Function	Purpose
`word_co_occurrence_network()`	Word co-occurrence graph
`word_correlation_network()`	Word correlation graph

Network Parameters

Parameter	Default	Description
`node_label_size`	22	Font size for node labels (12-40)
`community_method`	“leiden”	Algorithm: “leiden”, “louvain”
`top_node_n`	30	Number of top nodes to display
`co_occur_n`	10	Minimum co-occurrence count (co-occurrence only)
`corr_n`	0.4	Minimum correlation threshold (correlation only)

Network Statistics (9 Metrics)

Metric	Description
Nodes	Total unique words
Edges	Total connections
Density	Edge density (0-1)
Diameter	Longest shortest path
Global Clustering	Network clustering tendency
Avg Local Clustering	Average local clustering
Modularity	Community structure quality
Assortativity	Similar node connection tendency
Avg Path Length	Average node distance

6. Topic Modeling

Function	Purpose
`find_optimal_k()`	Search for optimal topic count
`fit_semantic_model()`	STM (Structural Topic Model)
`fit_embedding_model()`	Embedding-based topics (Python or R backend)
`fit_hybrid_model()`	STM + embeddings hybrid
`get_topic_terms()`	Extract top words per topic
`get_topic_prevalence()`	Calculate topic prevalence
`generate_topic_labels()`	AI-generated topic names

Visualization Functions

Function	Purpose
`plot_topic_probability()`	Topic probability distribution
`plot_topic_effects_categorical()`	Topic effects by category
`plot_topic_effects_continuous()`	Topic effects over continuous var
`plot_word_probability()`	Word probability per topic
`plot_quality_metrics()`	Model quality metrics

7. PDF Processing

Function	Purpose
`process_pdf_unified()`	Auto-fallback: multimodal (R + vision LLM) then text-only
`extract_text_from_pdf()`	Extract text (R)
`extract_pdf_multimodal()`	R-native vision AI for images in PDFs (Ollama/OpenAI/Gemini)
`describe_image()`	Describe an image using vision LLM
`detect_pdf_content_type()`	Detect PDF content type

8. AI Integration

TextAnalysisR uses a human-in-the-loop approach where AI provides suggestions that you review, edit, and approve before use. Content generation is topic-grounded: drafts are based on validated topic terms and beta scores, not parametric AI knowledge.

Supports local (Ollama) and web-based (OpenAI, Gemini) providers.

Function	Purpose
`call_llm_api()`	Unified LLM API (all providers)
`call_ollama()`	Local Ollama API
`call_gemini_chat()`	Gemini API
`describe_image()`	Vision LLM image description (Ollama/OpenAI/Gemini)
`generate_topic_labels()`	AI-suggested topic labels
`generate_topic_content()`	Topic-grounded content drafts
`generate_cluster_labels()`	AI-suggested cluster names
`analyze_sentiment_llm()`	LLM-based sentiment analysis
`run_rag_search()`	RAG search over documents
`get_api_embeddings()`	Web-based embeddings (OpenAI, Gemini)
`get_spacy_embeddings()`	Local spaCy word embeddings

Ollama Utilities

Function	Purpose
`check_ollama()`	Verify Ollama availability
`list_ollama_models()`	List installed models
`get_recommended_ollama_model()`	Auto-select best model

9. Linguistic Analysis

Function	Purpose
`extract_pos_tags()`	Identify word types (nouns, verbs, adjectives)
`extract_named_entities()`	Find people, places, organizations in text
`extract_morphology()`	Analyze verb tenses, plural forms

Requires Python. Run setup_python_env() first.

10. Python Environment

Function	Purpose
`setup_python_env()`	Set up Python environment
`check_python_env()`	Check Python configuration

11. Validation & Quality

Validation Functions

Function	Purpose
`cross_analysis_validation()`	Cross-validate analysis
`validate_semantic_coherence()`	Check semantic coherence
`calculate_clustering_metrics()`	Clustering quality metrics

Launch App

The Shiny app provides an interactive interface for all functions:

run_app()