AI Integration • TextAnalysisR

TextAnalysisR provides comprehensive AI/NLP capabilities via local and web-based providers.

Supported Providers

Provider	Type	API Key	Best For
Ollama	Local	None	Privacy, no cost, offline use
OpenAI	Web-based	OPENAI_API_KEY	Quality, speed
Gemini	Web-based	GEMINI_API_KEY	Quality, speed
spaCy	Local	None	Linguistic analysis
Transformers	Local	None	Embeddings, sentiment

Feature Categories

1. Topic-Grounded Content Generation

Generate content grounded in your validated topic terms (not generic AI knowledge):

Function	Purpose
`generate_topic_labels()`	AI-suggested labels from topic model terms
`generate_topic_content()`	Survey items, research questions, theme descriptions, policy recommendations

Content types available:

survey_item: Likert-scale questionnaire items
research_question: Literature review questions
theme_description: Academic theme summaries
policy_recommendation: Actionable policy suggestions
interview_question: Open-ended interview prompts

# Generate topic labels
labels <- generate_topic_labels(
 top_topic_terms,
 provider = "ollama",
 model = "llama3.2"
)

# Generate survey items
items <- generate_topic_content(
 topic_terms_df,
 content_type = "survey_item",
 provider = "openai"
)

2. Semantic Analysis & Clustering

Function	Purpose
`generate_cluster_labels()`	AI-suggested names for document clusters
`run_rag_search()`	Question-answering over your corpus (RAG)
`describe_image()`	Vision LLM image description (Ollama/OpenAI/Gemini)
`get_api_embeddings()`	Web-based document embeddings (OpenAI, Gemini)
`generate_embeddings()`	Local embeddings (sentence-transformers)

# Generate cluster labels
cluster_labels <- generate_cluster_labels(
 cluster_keywords,
 provider = "auto"  # Tries Ollama first, then web-based APIs
)

# RAG search over your documents
result <- run_rag_search(
 query = "What are the main findings?",
 documents = my_docs,
 provider = "openai"
)

3. Sentiment Analysis

Function	Type	Features
`analyze_sentiment_llm()`	LLM-based	Context-aware, detects sarcasm, mixed emotions
`sentiment_embedding_analysis()`	Local transformers	No API required, fast batch processing
`sentiment_lexicon_analysis()`	Dictionary-based	Multiple lexicons (AFINN, Bing, NRC)

# LLM-based sentiment (nuanced)
sentiment <- analyze_sentiment_llm(
 texts,
 provider = "gemini",
 include_explanation = TRUE
)

# Local transformer sentiment (no API needed)
sentiment <- sentiment_embedding_analysis(texts)

# Lexicon-based sentiment
sentiment <- sentiment_lexicon_analysis(texts, lexicon = "nrc")

4. Linguistic Analysis (spaCy)

Deep linguistic processing via spaCy NLP models:

Function	Purpose
`spacy_parse_full()`	Full annotation: POS, lemma, NER, dependency, morphology
`extract_noun_chunks()`	Keyphrase extraction
`extract_subjects_objects()`	Subject-verb-object triples
`get_word_similarity()`	Word vector similarity
`find_similar_words()`	Find semantically similar words
`get_sentences()`	Sentence segmentation

# Initialize spaCy
init_spacy_nlp("en_core_web_sm")

# Full linguistic parsing
parsed <- spacy_parse_full(
 texts,
 pos = TRUE,
 lemma = TRUE,
 entity = TRUE,
 dependency = TRUE,
 morph = TRUE
)

# Extract noun chunks (keyphrases)
chunks <- extract_noun_chunks(texts)

# Extract subject-verb-object triples
svo <- extract_subjects_objects(texts)

5. LLM API Access

Unified interface for all providers:

# Provider-agnostic (recommended)
response <- call_llm_api(
 provider = "openai",
 system_prompt = "You are a helpful assistant.",
 user_prompt = "Summarize this text..."
)

# Provider-specific
call_openai_chat(system_prompt, user_prompt, model = "gpt-4.1-mini")
call_gemini_chat(system_prompt, user_prompt, model = "gemini-2.5-flash")
call_ollama(prompt, model = "llama3.2")

6. Ollama Utilities

Function	Purpose
`check_ollama()`	Verify Ollama server is running
`list_ollama_models()`	List installed models
`get_recommended_ollama_model()`	Auto-select optimal model

# Check if Ollama is available
if (check_ollama()) {
 models <- list_ollama_models()
 best_model <- get_recommended_ollama_model()
}

Responsible AI Design

All AI features follow NIST AI Risk Management Framework principles:

Principle	Implementation
Human oversight	AI suggests, you review and approve
User control	Edit, regenerate, or override any output
Transparency	View prompts and parameters used
Privacy	Local options (Ollama, spaCy) for sensitive data
Grounding	Content based on your data, not generic knowledge

Setup

Local AI (Ollama)

# 1. Install Ollama: https://ollama.com
# 2. Pull a model (in terminal):
#    ollama pull llama3.2
#    ollama pull mistral

# 3. Verify in R:
check_ollama()
list_ollama_models()

Web-based AI (OpenAI/Gemini)

# Set API keys (choose one or both)
Sys.setenv(OPENAI_API_KEY = "your-openai-key")
Sys.setenv(GEMINI_API_KEY = "your-gemini-key")

# Or use .env file in project root
# OPENAI_API_KEY=your-key
# GEMINI_API_KEY=your-key

Linguistic Analysis (spaCy)

# Install Python dependencies
setup_python_env()

# Initialize spaCy with a model
init_spacy_nlp("en_core_web_sm")  # Small model
init_spacy_nlp("en_core_web_md")

Default Models

Provider	Chat Model	Embedding Model
OpenAI	gpt-4.1-mini	text-embedding-3-small
Gemini	gemini-2.5-flash	gemini-embedding-001
Ollama	llama3.2	nomic-embed-text
Local	-	all-MiniLM-L6-v2