Skip to contents

Getting Started

Launch the app and import data

run_app()
Launch the TextAnalysisR app
import_files()
Process Files

Preprocessing

Text preparation and feature extraction

unite_cols()
Unite Text Columns
prep_texts()
Preprocess Text Data
detect_multi_words()
Detect Multi-Word Expressions
get_available_dfm()
Get Available Document-Feature Matrix with Fallback
get_available_tokens()
Get Available Tokens with Fallback

Linguistic Analysis

POS tagging, NER, and morphology (requires Python)

extract_pos_tags()
Extract Part-of-Speech Tags from Tokens
extract_named_entities()
Extract Named Entities from Tokens
extract_morphology()
Extract Morphological Features
summarize_morphology()
Summarize Morphology Features
plot_pos_frequencies()
Plot Part-of-Speech Tag Frequencies
plot_entity_frequencies()
Plot Named Entity Frequencies
plot_morphology_feature()
Plot Morphology Feature Distribution
render_displacy_ent()
Render displaCy Entity Visualization

Lexical Analysis

Word frequency, keywords, readability, and dispersion

calculate_word_frequency()
Analyze and Visualize Word Frequencies Across a Continuous Variable
lexical_analysis
Lexical Analysis Functions
extract_keywords_keyness()
Extract Keywords Using Statistical Keyness
extract_keywords_tfidf()
Extract Keywords Using TF-IDF
lexical_frequency_analysis()
Lexical Frequency Analysis
lexical_diversity_analysis()
Lexical Diversity Analysis
clear_lexdiv_cache()
Clear Lexical Diversity Cache
calculate_text_readability()
Calculate Text Readability
calculate_log_odds_ratio()
Calculate Log Odds Ratio Between Categories
calculate_weighted_log_odds()
Calculate Weighted Log Odds Ratio
calculate_lexical_dispersion()
Calculate Lexical Dispersion
calculate_dispersion_metrics()
Calculate Dispersion Metrics
plot_word_frequency()
Plot Word Frequency
plot_tfidf_keywords()
Plot TF-IDF Keywords
plot_keyness_keywords()
Plot Statistical Keyness
plot_keyword_comparison()
Plot Keyword Comparison (TF-IDF vs Frequency)
plot_ngram_frequency()
Plot N-gram Frequency
plot_mwe_frequency()
Plot Multi-Word Expression Frequency
plot_readability_distribution()
Plot Readability Distribution
plot_readability_by_group()
Plot Readability by Group
plot_top_readability_documents()
Plot Top Documents by Readability
plot_lexical_diversity_distribution()
Plot Lexical Diversity Distribution
plot_log_odds_ratio()
Plot Log Odds Ratio
plot_weighted_log_odds()
Plot Weighted Log Odds
plot_lexical_dispersion()
Plot Lexical Dispersion

Sentiment Analysis

Lexicon and embedding-based sentiment

analyze_sentiment()
Analyze Text Sentiment
sentiment_lexicon_analysis()
Analyze Sentiment Using Tidytext Lexicons
sentiment_embedding_analysis()
Embedding-based Sentiment Analysis
plot_sentiment_boxplot()
Plot Sentiment Box Plot by Category
plot_sentiment_by_category()
Plot Sentiment by Category
plot_sentiment_distribution()
Plot Sentiment Distribution
plot_sentiment_violin()
Plot Sentiment Violin Plot by Category
plot_emotion_radar()
Plot Emotion Radar Chart
plot_document_sentiment_trajectory()
Plot Document Sentiment Trajectory
get_sentiment_color()
Generate Sentiment Color Gradient
get_sentiment_colors()
Get Sentiment Color Palette

Semantic Analysis

Similarity, clustering, and networks

generate_embeddings()
Generate Embeddings
reduce_dimensions()
Dimensionality Reduction Analysis
calculate_document_similarity()
Calculate Document Similarity
calculate_similarity_robust()
Calculate Similarity Robust
calculate_cosine_similarity()
Calculate Cosine Similarity Matrix
semantic_similarity_analysis()
Semantic Similarity Analysis
semantic_document_clustering()
Semantic Document Clustering
cluster_embeddings()
Embedding-based Document Clustering
analyze_document_clustering()
Analyze Document Clustering
export_document_clustering()
Export Document Clustering Analysis
generate_cluster_labels()
Generate Cluster Label Suggestions (Human-in-the-Loop)
generate_cluster_labels_auto()
Generate Cluster Labels
temporal_semantic_analysis()
Temporal Semantic Analysis
analyze_semantic_evolution()
Analyze Semantic Evolution
word_co_occurrence_network()
Analyze and Visualize Word Co-occurrence Networks
word_correlation_network()
Analyze and Visualize Word Correlation Networks
plot_semantic_viz()
Plot Semantic Analysis Visualization
plot_similarity_heatmap()
Plot Document Similarity Heatmap
plot_cross_category_heatmap()
Plot Cross-Category Similarity Comparison
plot_cluster_terms()
Plot Cluster Top Terms

Topic Modeling

STM, embedding-based, and hybrid models

find_optimal_k()
Find Optimal Number of Topics
fit_embedding_model()
Fit Embedding-based Topic Model
fit_hybrid_model()
Fit Hybrid Topic Model
fit_semantic_model()
Fit Semantic Model
fit_temporal_model()
Fit Temporal Topic Model
auto_tune_embedding_topics()
Auto-tune BERTopic Hyperparameters
assess_embedding_stability()
Assess Embedding Topic Model Stability
get_topic_terms()
Select Top Terms for Each Topic
get_topic_prevalence()
Get Topic Prevalence (Gamma) from STM Model
get_topic_texts()
Convert Topic Terms to Text Strings
calculate_topic_probability()
Calculate Topic Probabilities
calculate_topic_stability()
Calculate Topic Stability
identify_topic_trends()
Identify Topic Trends
generate_topic_labels()
Generate Topic Labels Using OpenAI's API
generate_topic_content()
Generate Content from Topic Terms
plot_topic_probability()
Plot Per-Document Per-Topic Probabilities
plot_topic_effects_categorical()
Plot Topic Effects for Categorical Variables
plot_topic_effects_continuous()
Plot Topic Effects for Continuous Variables
plot_word_probability()
Plot Word Probabilities by Topic
plot_model_comparison()
Plot Topic Model Comparison Scatter
plot_quality_metrics()
Plot Topic Model Quality Metrics
plot_term_trends_continuous()
Plot Term Frequency Trends by Continuous Variable

PDF & Multimodal

Text extraction from PDFs with optional vision AI

process_pdf_unified()
Process PDF File (Unified Entry Point)
process_pdf_file()
Process PDF File
process_pdf_file_py()
Process PDF File using Python
extract_text_from_pdf()
Extract Text from PDF
extract_text_from_pdf_py()
Extract Text from PDF using Python
extract_tables_from_pdf_py()
Extract Tables from PDF using Python
extract_pdf_multimodal()
Extract PDF with Multimodal Analysis
extract_pdf_smart()
Smart PDF Extraction with Auto-Detection
detect_pdf_content_type()
Detect PDF Content Type
detect_pdf_content_type_py()
Detect PDF Content Type using Python

AI Integration

Topic-grounded content generation via local and web-based APIs

check_ollama()
Check if Ollama is Available
list_ollama_models()
List Available Ollama Models
call_ollama()
Call Ollama for Text Generation
call_openai_chat()
Call OpenAI Chat Completion API
call_gemini_chat()
Call Gemini Chat API
call_llm_api()
Call LLM API (Unified Wrapper)
check_vision_models()
Check Vision Model Availability
get_recommended_ollama_model()
Get Recommended Ollama Model
get_best_embeddings()
Get Best Available Embeddings
get_api_embeddings()
Get Embeddings from API
get_spacy_embeddings()
Get spaCy Word Embeddings
run_rag_search()
RAG-Enhanced Semantic Search
analyze_sentiment_llm()
LLM-based Sentiment Analysis
get_content_type_prompt()
Get Default System Prompt for Content Type
get_content_type_user_template()
Get Default User Prompt Template for Content Type

Python Environment

Python environment setup

setup_python_env()
Setup Python Environment
check_python_env()
Check Python Environment Status

Validation

Quality metrics and cross-validation

cross_analysis_validation()
Cross Analysis Validation
validate_cross_models()
Cross-Analysis Validation
validate_semantic_coherence()
Validate Semantic Coherence
calculate_clustering_metrics()
Calculate Clustering Quality Metrics
calculate_cross_similarity()
Calculate Cross-Matrix Cosine Similarity
analyze_similarity_gaps()
Analyze Similarity Gaps Between Categories
extract_cross_category_similarities()
Extract Cross-Category Similarities from Full Similarity Matrix

Data

Example datasets

SpecialEduTech
Special education technology bibliographic data
acronym
Acronym List
stm_15
An example structure of a structural topic model
dictionary_list_1
Dictionary List 1
dictionary_list_2
Dictionary List 2
stopwords_list
Stopwords List