Skip to contents

Unified PDF processing:

  1. Multimodal (R-native pdftools + Vision LLM) if enabled

  2. R pdftools text extraction as fallback

Usage

process_pdf_unified(
  file_path,
  use_multimodal = FALSE,
  vision_provider = "ollama",
  vision_model = NULL,
  api_key = NULL,
  describe_images = TRUE
)

Arguments

file_path

Character string path to PDF file

use_multimodal

Logical, enable multimodal extraction

vision_provider

Character, "ollama", "openai", or "gemini"

vision_model

Character, model name

api_key

Character, API key (if using openai/gemini)

describe_images

Logical, generate image descriptions

Value

List: success, data, type, method, message