PDF Analyzer Logo
PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.


  • Written entirely in Python. (for version 2.4 or newer)
  • Parse, analyze, and convert PDF documents.
  • PDF-1.7 specification support. (well, almost)
  • CJK languages and vertical writing scripts support.
  • Various font types (Type1, TrueType, Type3, and CID) support.
  • Basic encryption (RC4) support.
  • PDF to HTML conversion (with a sample converter web app).
  • Outline (TOC) extraction.
  • Tagged contents extraction.
  • Reconstruct the original layout by grouping text chunks.
PDFMiner is about 20 times slower than other C/C++-based counterparts such as XPdf. 

Tutorials ::

How to Install

  1. Install Python 2.4 or newer. (Python 3 is not supported.)
  2. Download the PDFMiner source.
  3. Unpack it.
  4. Run setup.py to install:
    # python setup.py install
  5. Do the following test:
    $ pdf2txt.py samples/simple1.pdf
    H e l l o
    W o r l d
    H e l l o
    W o r l d
  6. Done!
For more tutorials visit official website.

Download ::


Post a Comment