Extract both text and visual content from PDFs, converting everything to text for downstream analysis in your existing workflow.
Usage
extract_pdf_multimodal(
file_path,
vision_provider = "ollama",
vision_model = NULL,
api_key = NULL,
describe_images = TRUE,
envname = "langgraph-env"
)Arguments
- file_path
Character string path to PDF file
- vision_provider
Character: "ollama" (local, default) or "openai" (cloud)
- vision_model
Character: Model name
For Ollama: "llava", "llava:13b", "bakllava"
For OpenAI: "gpt-4-vision-preview", "gpt-4o"
- api_key
Character: OpenAI API key (required if vision_provider="openai")
- describe_images
Logical: Convert images to text descriptions (default: TRUE)
- envname
Character: Python environment name (default: "langgraph-env")
Value
List with:
success: Logical
combined_text: Character string with all content for text analysis
text_content: List of text chunks
image_descriptions: List of image descriptions
num_images: Integer count of processed images
vision_provider: Character indicating provider used
message: Character status message
Details
Workflow Integration:
Extracts text using Marker (preserves equations, tables, structure)
Detects images/charts/diagrams in PDF
Uses vision LLM to describe visual content as text
Merges text + descriptions → single text corpus
Feed to existing text analysis pipeline
Vision Provider Options:
Ollama (Default - Local & Free):
Privacy: Everything runs locally
Cost: Free
Setup: Requires Ollama installed + vision model pulled
Models: llava, bakllava, llava-phi3
OpenAI (Optional - Cloud):
Privacy: Data sent to OpenAI
Cost: Paid (user's API key)
Setup: Just provide API key
Models: gpt-4-vision-preview, gpt-4o
Examples
if (FALSE) { # \dontrun{
# Local analysis with Ollama (free, private)
result <- extract_pdf_multimodal("research_paper.pdf")
# Access combined text for analysis
text_for_analysis <- result$combined_text
# Use in existing workflow
corpus <- prep_texts(text_for_analysis)
topics <- fit_semantic_model(corpus, k = 5)
# Optional: Use OpenAI for better accuracy
result <- extract_pdf_multimodal(
"paper.pdf",
vision_provider = "openai",
vision_model = "gpt-4o",
api_key = Sys.getenv("OPENAI_API_KEY")
)
} # }
