Sunday, 3 August 2025

Langchain4j: How to Use the Document API

If you're building AI apps using Langchain4j, one of the first things you’ll deal with is text—lots of it. That text can come from many sources like PDFs, Word files, websites, or plain text files.


To make it easier to handle, Langchain4j provides a handy class called Document. It lets you store the content along with useful details (called metadata) like where the text came from, when it was created, or who owns it.

 

Key methods:

·      Document.text(): Get the raw content

·      Document.metadata(): Fetch associated metadata

·      Document.toTextSegment(): Convert to a TextSegment for chunking and indexing

·      Document.from(String, Metadata), Document.document(String, Metadata) creates a Document from text and Metadata

·      Document.from(String), Document.document(String):  creates a Document from text with empty Metadata

 

Find the below working application.

 

DocumentHelloWorld.java

package com.sample.app.rag.apis.documents;

import java.util.HashMap;
import java.util.Map;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.Metadata;

public class DocumentHelloWorld {

    private static void printDocument(Document document) {
        System.out.println("Content : " + document.text());
        System.out.println("Metadata : " + document.metadata());
    }

    public static void main(String[] args) {
        Document doc1 = Document.from("Sample Document1");
        printDocument(doc1);

        Map<String, Object> doc2Metadata = new HashMap<>();
        doc2Metadata.put(Document.FILE_NAME, "langchain4j_tutorial.pdf");
        Document doc2 = Document.from("Langchain4j is easy to learn", new Metadata(doc2Metadata));
        printDocument(doc2);

    }

}

 

Output

Content : Sample Document1
Metadata : Metadata { metadata = {} }
Content : Langchain4j is easy to learn
Metadata : Metadata { metadata = {file_name=langchain4j_tutorial.pdf} }

You can even load all the Documents from the specified directory using FileSystemDocumentLoader.loadDocuments(FOLDER_PATH) method

 

DocumentsFromFolder.java

package com.sample.app.rag.apis.documents;

import java.util.List;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;

public class DocumentsFromFolder {
	public static void main(String[] args) {
		String resourcesFolderPath = "/Users/Shared/llm_docs";
		List<Document> documents = FileSystemDocumentLoader.loadDocuments(resourcesFolderPath);

		for (Document doc : documents) {
			System.out.println("------------------------------------------");
			System.out.println("File name : " + doc.text());
			System.out.println("Metadata : " + doc.metadata());
			System.out.println("------------------------------------------\n");
		}
	}

}


  

Previous                                                    Next                                                    Home

No comments:

Post a Comment