Skip to contents

Quick reference guide organized by workflow stage.

1. Data Import & Preprocessing

Function Purpose
import_files() Import CSV, XLSX, PDF, DOCX, TXT files
unite_cols() Combine multiple text columns into one
prep_texts() Tokenize with full preprocessing options
detect_multi_words() Find collocations (n-grams)
get_available_dfm() Get best available DFM with fallback

2. Lexical Analysis

Function Purpose
calculate_word_frequency() Count word frequencies
extract_keywords_tfidf() TF-IDF keyword extraction
extract_keywords_keyness() Keyness-based keywords
lexical_diversity_analysis() TTR, MATTR, MTLD metrics
calculate_text_readability() Flesch, SMOG, ARI scores
Visualization Functions
Function Purpose
plot_word_frequency() Bar chart of word frequencies
plot_tfidf_keywords() TF-IDF keyword visualization
plot_keyness_keywords() Keyness comparison plot
plot_ngram_frequency() N-gram frequency plot
plot_readability_distribution() Readability score distribution
plot_lexical_diversity_distribution() Diversity metrics plot

3. Sentiment Analysis

Function Purpose
analyze_sentiment() Quick sentiment scoring
sentiment_lexicon_analysis() Dictionary-based (no Python)
sentiment_embedding_analysis() Neural sentiment (Python)
Visualization Functions
Function Purpose
plot_sentiment_distribution() Sentiment score histogram
plot_sentiment_by_category() Sentiment by group
plot_sentiment_boxplot() Box plot comparison
plot_emotion_radar() Emotion radar chart

4. Semantic Analysis

Function Purpose
generate_embeddings() Create document embeddings
reduce_dimensions() PCA, t-SNE, UMAP reduction
calculate_document_similarity() Compute similarity matrix
semantic_similarity_analysis() Full similarity workflow
semantic_document_clustering() Cluster similar documents
generate_cluster_labels() AI-generated cluster names
Visualization Functions
Function Purpose
plot_semantic_viz() 2D/3D semantic visualization
plot_similarity_heatmap() Similarity matrix heatmap
plot_cross_category_heatmap() Cross-category similarity comparison
plot_cluster_terms() Cluster term visualization

5. Network Analysis

Function Purpose
semantic_cooccurrence_network() Word/document co-occurrence graph
semantic_correlation_network() Word/document correlation graph
Network Parameters
Parameter Default Description
feature_type “words” Feature space: “words”, “ngrams”, “embeddings”
embedding_sim_threshold 0.5 Similarity threshold for embedding networks (0.3-0.9)
node_label_size 22 Font size for node labels (12-40)
community_method “leiden” Algorithm: “leiden”, “louvain”, “label_prop”, “fast_greedy”
top_node_n 30 Number of top nodes to display
co_occur_n 10 Minimum co-occurrence count (co-occurrence only)
corr_n 0.4 Minimum correlation threshold (correlation only)
Network Statistics (9 Metrics)
Metric Description
Nodes Total unique terms/documents
Edges Total connections
Density Edge density (0-1)
Diameter Longest shortest path
Global Clustering Network clustering tendency
Avg Local Clustering Average local clustering
Modularity Community structure quality
Assortativity Similar node connection tendency
Avg Path Length Average node distance

6. Topic Modeling

Function Purpose
find_optimal_k() Search for optimal topic count
fit_semantic_model() STM (Structural Topic Model)
fit_embedding_topics() Embedding-based topics (BERTopic)
fit_hybrid_model() STM + embeddings hybrid
get_topic_terms() Extract top words per topic
get_topic_prevalence() Calculate topic prevalence
generate_topic_labels() AI-generated topic names
Visualization Functions
Function Purpose
plot_topic_probability() Topic probability distribution
plot_topic_effects_categorical() Topic effects by category
plot_topic_effects_continuous() Topic effects over continuous var
plot_word_probability() Word probability per topic
plot_quality_metrics() Model quality metrics

7. PDF Processing

Function Purpose
process_pdf_unified() Auto-fallback PDF extraction
extract_text_from_pdf() Extract text (R)
extract_pdf_multimodal() Vision AI for images in PDFs
detect_pdf_content_type() Detect PDF content type

8. AI Integration

Function Purpose
check_ollama() Verify Ollama availability
call_ollama() Direct Ollama API call
call_openai_chat() OpenAI API call
generate_topic_labels_langgraph() Multi-agent topic labeling
generate_survey_items() Generate survey items

9. NLP with spaCy

Function Purpose
extract_pos_tags() Extract POS tags using spacyr
extract_named_entities() Extract named entities using spacyr

Note: Uses the spacyr R package for spaCy integration.

10. Python Environment

Function Purpose
setup_python_env() Set up Python environment
check_python_env() Check Python configuration

11. Validation & Quality

Validation Functions
Function Purpose
cross_analysis_validation() Cross-validate analysis
validate_semantic_coherence() Check semantic coherence
calculate_clustering_metrics() Clustering quality metrics

Launch App

The Shiny app provides an interactive interface for all functions: