Monday, 7 March 2016

Introduction to Apache PDFBox

Apache PDFBox is a open source java library to work with PDF (Portable Document Format) documents. Apache PDF box can be used under Apache License v2.0.

What can I do using Apache PDF Box?
a.   You can create new PDF documents
b.   You can extract the contents from existing documents
c.    You can manipulate given document
d.   You can digitally sign PDF file
e.   Print a PDF file using the standard Java printing API.
f.     Save PDFs as image files, such as PNG or JPEG.
g.   You can Validate PDF files against the PDF/A-1b standard.
h.   Provide number of command line tools to encrypt, decrypt PDF document, extract all images from PDF document, extract content from PDF document, sending PDF document to printer, create an image for every page of PDF document, get the PDF from text file etc.,

I am going to use following maven dependencies for this tutorial.


<dependency>
 <groupId>org.apache.pdfbox</groupId>
 <artifactId>pdfbox</artifactId>
 <version>2.0.0-RC3</version>
</dependency>

<dependency>
 <groupId>org.apache.pdfbox</groupId>
 <artifactId>pdfbox-lucene</artifactId>
 <version>1.8.11</version>
</dependency>

<dependency>
 <groupId>org.bouncycastle</groupId>
 <artifactId>bcprov-jdk16</artifactId>
 <version>1.46</version>
</dependency>

<dependency>
 <groupId>commons-io</groupId>
 <artifactId>commons-io</artifactId>
 <version>2.4</version>
</dependency>

<dependency>
 <groupId>org.apache.pdfbox</groupId>
 <artifactId>preflight</artifactId>
 <version>2.0.0-RC3</version>
</dependency>

ApachePDFBox provides command line utility to Decrypt Encrypt ExtractImages ExtractText OverlayPDF PrintPDF PDFDebugger PDFReader PDFMerger PDFSplit PDFToImage TextToPDF WriteDecodedDoc.

Following document explains the usage of command line tool.
https://pdfbox.apache.org/1.8/commandline.html




Previous                                                 Next                                                 Home

No comments:

Post a Comment