Extract text from PDFs with charts, diagrams, and images using vision AI. R-native pipeline – no Python required.
How It Works
- Extracts text from each page using
pdftools::pdf_text()(R-native) - Renders each page as a PNG image via
pdftools::pdf_render_page() - Identifies sparse-text pages (< 500 characters) that likely contain figures
- Sends only those pages to a vision LLM for description
- Merges extracted text + image descriptions into a single text corpus
Setup
Cloud (OpenAI / Gemini)
Sys.setenv(OPENAI_API_KEY = "sk-...")
Sys.setenv(GEMINI_API_KEY = "your-gemini-key")Usage
library(TextAnalysisR)
# Extract PDF with vision AI (default: Ollama)
result <- extract_pdf_multimodal(
"document.pdf",
vision_provider = "ollama" # or "openai" or "gemini"
)
# Use in analysis
tokens <- prep_texts(result$combined_text)Gemini Example
result <- extract_pdf_multimodal(
"paper.pdf",
vision_provider = "gemini",
api_key = Sys.getenv("GEMINI_API_KEY")
)Describe Individual Images
description <- describe_image(
image_base64,
provider = "openai",
api_key = Sys.getenv("OPENAI_API_KEY")
)Unified PDF Pipeline
process_pdf_unified() provides automatic fallback:
- Multimodal (pdftools + vision LLM) – extracts text and describes visual content
- Text-only (pdftools) – fallback if no vision provider is available
result <- process_pdf_unified("paper.pdf", vision_provider = "gemini")