Programming for beginners: PDFBox: get all attachment names from PDF document

In previous post, I explained about how to attach documents to given PDF document. In this post, I am going to explain how to extract attachments from PDF document.

Step 1: Load the PDF document.

PDDocument doc = PDDocument.load(new File(fileName))

Step 2: Attachments are stored as part of the "names" dictionary in the document catalog, So get a PDDocumentNameDictionary instance.

PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());

Step 3: Get all the embedded files.

PDEmbeddedFilesNameTreeNode efTree = names.getEmbeddedFiles();
Map<String, PDComplexFileSpecification> existedNames = efTree.getNames();

Following is the complete function, which returns all the names of attachments.

public static Optional<Set<String>> getAttachements(String fileName) {

 if (Objects.isNull(fileName)) {
  throw new NullPointerException("fileName shouldn'e be null");
 }

 try (PDDocument doc = PDDocument.load(new File(fileName))) {

  /*
   * Attachments are stored as part of the "names" dictionary in the
   * document catalog
   */
  PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());

  PDEmbeddedFilesNameTreeNode efTree = names.getEmbeddedFiles();
  if (Objects.isNull(efTree)) {
   return Optional.empty();
  }
  Map<String, PDComplexFileSpecification> existedNames = efTree.getNames();

  return Optional.of(existedNames.keySet());

 } catch (IOException e) {
  System.out.println(e.getMessage());
  return Optional.empty();
 }

}

import java.io.IOException;
import java.util.Optional;
import java.util.Set;

public class PDFTextStripperUtilTest {
 public static void main(String args[]) throws IOException {
  String fileName = "/Users/harikrishna_gurram/Downloads/Saurabh.pdf";

  Optional<Set<String>> attachements = PDFTextStripperUtil
    .getAttachements(fileName);

  if (attachements.isPresent()) {
   attachements.get().forEach(System.out::println);
  }

 }

}

Go through following post, to get the code of complete utility class.

https://self-learning-java-tutorial.blogspot.com/2016/03/apache-pdfbox-utility-class-to-work.html

Previous Next Home

Programming for beginners

Monday, 7 March 2016

PDFBox: get all attachment names from PDF document

No comments:

Post a Comment