📄

PDF Reader

Verified

by Community

Parse PDF files to extract text content, tables, metadata, and structured data. Works with contracts, reports, invoices, and research papers. Converts PDFs into usable text for analysis.

pdfdocumentextracttexttablesproductivity

PDF Reader Skill

Extract and process content from PDF documents.

Basic Text Extraction

python3 -c "
import subprocess
result = subprocess.run(['pdftotext', '{file}', '-'], capture_output=True, text=True)
print(result.stdout[:5000])
"

Page-by-Page Extraction

python3 -c "
import subprocess
result = subprocess.run(['pdftotext', '-f', '{first_page}', '-l', '{last_page}', '{file}', '-'], capture_output=True, text=True)
print(result.stdout)
"

PDF Metadata

python3 -c "
import subprocess
result = subprocess.run(['pdfinfo', '{file}'], capture_output=True, text=True)
print(result.stdout)
"

Table Extraction

python3 -c "
import subprocess
result = subprocess.run(['pdftotext', '-layout', '{file}', '-'], capture_output=True, text=True)
print(result.stdout[:5000])
"

Guidelines

  • Try pdftotext first — it's fast and handles most PDFs well
  • Use -layout flag to preserve table formatting
  • For scanned PDFs, note that OCR is needed (not supported by pdftotext)
  • Extract specific page ranges for large documents
  • Summarize extracted content rather than dumping entire documents