In previous
post, I explained about how to attach documents to given PDF document. In this
post, I am going to explain how to extract attachments from PDF document.
Step 1: Load the PDF document.
PDDocument doc = PDDocument.load(new File(fileName))
Step
2: Attachments are
stored as part of the "names" dictionary in the document catalog, So
get a PDDocumentNameDictionary instance.
PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
Step 3: Get all the embedded files.
PDEmbeddedFilesNameTreeNode efTree = names.getEmbeddedFiles(); Map<String, PDComplexFileSpecification> existedNames = efTree.getNames();
Following is
the complete function, which returns all the names of attachments.
public static Optional<Set<String>> getAttachements(String fileName) { if (Objects.isNull(fileName)) { throw new NullPointerException("fileName shouldn'e be null"); } try (PDDocument doc = PDDocument.load(new File(fileName))) { /* * Attachments are stored as part of the "names" dictionary in the * document catalog */ PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog()); PDEmbeddedFilesNameTreeNode efTree = names.getEmbeddedFiles(); if (Objects.isNull(efTree)) { return Optional.empty(); } Map<String, PDComplexFileSpecification> existedNames = efTree.getNames(); return Optional.of(existedNames.keySet()); } catch (IOException e) { System.out.println(e.getMessage()); return Optional.empty(); } }
import java.io.IOException; import java.util.Optional; import java.util.Set; public class PDFTextStripperUtilTest { public static void main(String args[]) throws IOException { String fileName = "/Users/harikrishna_gurram/Downloads/Saurabh.pdf"; Optional<Set<String>> attachements = PDFTextStripperUtil .getAttachements(fileName); if (attachements.isPresent()) { attachements.get().forEach(System.out::println); } } }
Go through
following post, to get the code of complete utility class.
No comments:
Post a Comment