Detect PDF Content Type using Python — detect_pdf_content_type_py • TextAnalysisR

Analyzes PDF to determine if it contains primarily tabular data or text.

Usage

detect_pdf_content_type_py(file_path, envname = "textanalysisr-env")

Arguments

file_path: Character string path to PDF file
envname: Character string, name of Python virtual environment (default: "textanalysisr-env")

Value

Character string: "tabular", "text", or "unknown"

See also

Other pdf: check_vision_models(), detect_pdf_content_type(), extract_pdf_multimodal(), extract_pdf_smart(), extract_tables_from_pdf_py(), extract_text_from_pdf(), extract_text_from_pdf_py(), process_pdf_file(), process_pdf_file_py()

Examples

if (FALSE) { # \dontrun{
setup_python_env()

pdf_path <- "path/to/document.pdf"
content_type <- detect_pdf_content_type_py(pdf_path)
print(content_type)
} # }