Detect PDF Content Type — detect_pdf_content_type • TextAnalysisR

Analyzes PDF to determine if it contains readable text.

Usage

detect_pdf_content_type(file_path)

Arguments

file_path: Character string path to PDF file

Value

Character string: "text" or "unknown"

Details

Attempts text extraction using pdftools. Returns "text" if successful, or "unknown" if extraction fails or PDF is empty.

For table extraction from PDFs, use extract_tables_from_pdf_py.

See also

Other pdf: check_vision_models(), detect_pdf_content_type_py(), extract_pdf_multimodal(), extract_pdf_smart(), extract_tables_from_pdf_py(), extract_text_from_pdf(), extract_text_from_pdf_py(), process_pdf_file(), process_pdf_file_py()

Examples

if (FALSE) { # \dontrun{
pdf_path <- "path/to/document.pdf"
content_type <- detect_pdf_content_type(pdf_path)
print(content_type)
} # }