Extract text from PDFs with charts, diagrams, and images using vision AI.
Setup
Cloud (OpenAI)
Sys.setenv(OPENAI_API_KEY = "sk-...")Usage
library(TextAnalysisR)
# Extract PDF with images
result <- extract_pdf_multimodal(
"document.pdf",
vision_provider = "ollama" # or "openai"
)
# Use in analysis
tokens <- prep_texts(result$combined_text)Smart Extraction
Auto-detects document type:
result <- extract_pdf_smart("paper.pdf", doc_type = "auto")