OpenClaw Launch

📄

PDF Reader

Verified

by Community

Parse PDF files to extract text content, tables, metadata, and structured data. Works with contracts, reports, invoices, and research papers. Converts PDFs into usable text for analysis.

pdfdocumentextracttexttablesproductivity

PDF Reader Skill

Extract and process content from PDF documents.

Basic Text Extraction

python3 -c "
import subprocess
result = subprocess.run(['pdftotext', '{file}', '-'], capture_output=True, text=True)
print(result.stdout[:5000])
"

Page-by-Page Extraction

python3 -c "
import subprocess
result = subprocess.run(['pdftotext', '-f', '{first_page}', '-l', '{last_page}', '{file}', '-'], capture_output=True, text=True)
print(result.stdout)
"

PDF Metadata

python3 -c "
import subprocess
result = subprocess.run(['pdfinfo', '{file}'], capture_output=True, text=True)
print(result.stdout)
"

Table Extraction

python3 -c "
import subprocess
result = subprocess.run(['pdftotext', '-layout', '{file}', '-'], capture_output=True, text=True)
print(result.stdout[:5000])
"

Guidelines

Try pdftotext first — it's fast and handles most PDFs well
Use -layout flag to preserve table formatting
For scanned PDFs, note that OCR is needed (not supported by pdftotext)
Extract specific page ranges for large documents
Summarize extracted content rather than dumping entire documents