Extracts text content from a PDF file using pdftools package.
Value
Data frame with columns: page (integer), text (character) Returns NULL if extraction fails or PDF is empty
Details
Uses pdftools::pdf_text() to extract text from each page.
Preserves page structure and cleans whitespace.
Works best with text-based PDFs (not scanned images).
Examples
if (FALSE) { # \dontrun{
pdf_path <- "path/to/document.pdf"
text_data <- extract_text_from_pdf(pdf_path)
head(text_data)
} # }
