Programming for beginners: Content Aggregation in LangChain4j: From Multi-Retriever Results to Ranked Relevance

When building Retrieval-Augmented Generation (RAG) applications, it's common to fetch relevant content using multiple queries or content retrievers. However, this introduces challenges: how do you combine overlapping, partially relevant, or differently ranked content into a unified list that’s both accurate and non-redundant for the LLM?

LangChain4j addresses this problem with a powerful component: the ContentAggregator. This post explores its purpose, core logic, and implementations, including the DefaultContentAggregator and the ReRankingContentAggregator.

1. What Is a Content Aggregator?

In LangChain4j, the ContentAggregator interface defines a strategy for combining multiple lists of ranked content retrieved via different queries and content retrievers into a single, consistent, and sorted list of results.

This is important in RAG-based applications where:

· A single user query may be transformed into multiple sub-queries.

· Each sub-query might hit different retrievers (e.g., vector DBs, web APIs, PDF documents).

· Each retriever returns its own ranked content.

Instead of bombarding the LLM with all of this data, the ContentAggregator ensures only the most relevant and non-redundant content is passed along.

2. Interface Overview

Here’s what the ContentAggregator interface looks like:

public interface ContentAggregator {
    List<Content> aggregate(Map<Query, Collection<List<Content>>> queryToContents);
}

Here:

· Input: A map from each Query to a collection of List<Content, one list per retriever hit.

· Output: A single, relevance-ranked List<Content>.

3. ContentAggregator Implementations

LangChain4j offers two built-in implementations of the ContentAggregator interface:

· DefaultContentAggregator

· ReRankingContentAggregator

3.1 DefaultContentAggregator (Uses Reciprocal Rank Fusion Algorithm)

The DefaultContentAggregator is the standard implementation provided by LangChain4j. It performs two-stage Reciprocal Rank Fusion (RRF), a widely used technique in information retrieval.

What is Reciprocal Rank Fusion?

RRF combines multiple ranked lists by assigning scores based on the inverse of their rank positions. The idea: content that ranks high in multiple lists gets a strong overall score.

3.2 ReRankingContentAggregator (Semantic Relevance Boosted by Scoring Models)

Sometimes, structural signals like position in a list (used in RRF) aren’t enough. That’s where the ReRankingContentAggregator shines. It uses a ScoringModel (e.g., from Cohere, OpenAI, or other providers) to semantically re-score content. The re-ranking is based on how well each content chunk semantically aligns with the query.

Find the below working application.

ContentAggregatorDemo.java

package com.sample.app.contentaggregator;

import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.content.aggregator.ContentAggregator;
import dev.langchain4j.rag.content.aggregator.DefaultContentAggregator;
import dev.langchain4j.rag.query.Query;

public class ContentAggregatorDemo {

    public static void main(String[] args) {

        // Step 1: Define sample queries
        Query query1 = new Query("What are the investment options available?");
        Query query2 = new Query("Tell me about the risk assessment process.");

        // Step 2: Create content chunks related to each query
        List<Content> investmentContentList1 = List.of(
                Content.from("Our company offers SIPs, mutual funds, and fixed deposits."),
                Content.from("You can diversify your investments via index funds and ETFs."));

        List<Content> investmentContentList2 = List.of(
                Content.from("Fixed income options like bonds are also available."),
                Content.from("We give personalized investment advice based on each client's risk level."));

        List<Content> riskContentList1 = List.of(
                Content.from("Risk assessment includes credit score checks and portfolio volatility analysis."),
                Content.from("We follow RBI guidelines for financial risk mitigation."));

        List<Content> riskContentList2 = List.of(
                Content.from("We use internal scoring models to assess client eligibility and investment risk."),
                Content.from("Risk is periodically reviewed based on market trends and compliance audits."));

        // Step 3: Build the map of Query -> Collection<List<Content>>
        Map<Query, Collection<List<Content>>> queryToContents = new HashMap<>();
        queryToContents.put(query1, List.of(investmentContentList1, investmentContentList2));
        queryToContents.put(query2, List.of(riskContentList1, riskContentList2));

        // Use DefaultContentAggregator (uses Reciprocal Rank Fusion)
        ContentAggregator defaultAggregator = new DefaultContentAggregator();
        List<Content> defaultAggregated = defaultAggregator.aggregate(queryToContents);

        System.out.println("=== DefaultContentAggregator Output ===");
        for (Content c : defaultAggregated) {
            System.out.println("- " + c.textSegment().text());
        }

    }
}

Output

=== DefaultContentAggregator Output ===
- Risk assessment includes credit score checks and portfolio volatility analysis.
- Our company offers SIPs, mutual funds, and fixed deposits.
- We use internal scoring models to assess client eligibility and investment risk.
- Fixed income options like bonds are also available.
- We follow RBI guidelines for financial risk mitigation.
- You can diversify your investments via index funds and ETFs.
- Risk is periodically reviewed based on market trends and compliance audits.
- We give personalized investment advice based on each client's risk level.

Let’s build a complete RAG application using content aggregator

    // Create retrieval augmentor with dynamic contentRetriever via router
    RetrievalAugmentor retrievalAugmentor =
        DefaultRetrievalAugmentor.builder()
            .queryTransformer(expandingQueryTransformer)
            .queryRouter(router)
            .contentAggregator(new DefaultContentAggregator())
            .build();

    // Build the assistant
    ChatAssistant chatAssistant =
        AiServices.builder(ChatAssistant.class)
            .chatModel(chatModel)
            .retrievalAugmentor(retrievalAugmentor)
            .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
            .build();

This code snippet shows how to configure a RetrievalAugmentor in LangChain4j by setting a custom ContentAggregator.

· It uses a DefaultRetrievalAugmentor builder to configure:

o a queryTransformer (e.g., for expanding or rewriting queries),

o a queryRouter (to route queries to appropriate retrievers),

o and a DefaultContentAggregator (to merge retrieved content using Reciprocal Rank Fusion).

· This RetrievalAugmentor is then used to build a ChatAssistant, combining it with a chat model and chat memory.

· The resulting assistant can process user inputs, transform and route queries, retrieve and aggregate content, and respond accordingly.

Find the below working application.

ContentAggregatorDemoFullApp.java

package com.sample.app.contentaggregator;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.aggregator.DefaultContentAggregator;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.rag.query.router.LanguageModelQueryRouter;
import dev.langchain4j.rag.query.transformer.ExpandingQueryTransformer;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class ContentAggregatorDemoFullApp {
  interface ChatAssistant {
    String chat(String userMessage);
  }

  public static void main(String[] args) {

    // Initialize LLM model for routing and chatting
    OllamaChatModel chatModel =
        OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2").build();

    // Shared embedding model
    EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel();

    // Create embedding stores
    EmbeddingStore<TextSegment> hrStore = new InMemoryEmbeddingStore<>();
    EmbeddingStore<TextSegment> itStore = new InMemoryEmbeddingStore<>();
    EmbeddingStore<TextSegment> financeStore = new InMemoryEmbeddingStore<>();

    // Sample data for ingestion

    List<String> hrDocs =
        List.of(
            "Our HR policies include flexible work hours, medical leave, and onboarding support.",
            "You can check your leave balance and benefits on the HR portal.",
            "Employee handbooks and code of conduct documents are available on the intranet.",
            "Annual performance reviews are conducted in Q1 of each year.",
            "We provide mental wellness programs and employee assistance plans (EAP).",
            "New employees must complete their joining formalities within the first week.",
            "HR holds town halls every quarter to discuss policy changes and updates.",
            "Exit interviews are mandatory and help improve employee retention practices.");

    List<String> itDocs =
        List.of(
            "If you're facing issues with VPN, restart your system and try again.",
            "Password resets can be done via the IT Helpdesk portal.",
            "For software installation requests, raise a ticket through the ServiceNow portal.",
            "Two-factor authentication is mandatory for accessing company email remotely.",
            "Laptop issues should be reported to the IT asset management team.",
            "New joiners will receive their device credentials within 24 hours of onboarding.",
            "We recommend using the Chrome browser for all internal web tools.",
            "The weekly IT newsletter includes patch updates and known issues.");

    List<String> financeDocs =
        List.of(
            "Payslips are generated on the 5th of every month and available on the finance dashboard.",
            "You can file business travel reimbursements through the expense portal.",
            "Employees must submit receipts within 15 days for expense claims.",
            "Annual tax declarations must be uploaded to the HRMS by January 31.",
            "Salary revisions are processed in March and reflected in April pay.",
            "All invoice-related queries should be addressed to finance@company.com.",
            "The company reimburses professional certification exam fees up to ₹10,000.",
            "You can track investment proofs under the ‘My Tax’ section on the intranet.");

    // Ingest documents into respective stores
    EmbeddingStoreIngestor.ingest(convertToDocuments(hrDocs), hrStore);
    EmbeddingStoreIngestor.ingest(convertToDocuments(itDocs), itStore);
    EmbeddingStoreIngestor.ingest(convertToDocuments(financeDocs), financeStore);

    // Create content retrievers
    ContentRetriever hrRetriever =
        EmbeddingStoreContentRetriever.builder()
            .embeddingStore(hrStore)
            .embeddingModel(embeddingModel)
            .maxResults(3)
            .minScore(0.7)
            .displayName("hrRetriever")
            .build();

    ContentRetriever itRetriever =
        EmbeddingStoreContentRetriever.builder()
            .embeddingStore(itStore)
            .embeddingModel(embeddingModel)
            .maxResults(3)
            .minScore(0.7)
            .displayName("itRetriever")
            .build();

    ContentRetriever financeRetriever =
        EmbeddingStoreContentRetriever.builder()
            .embeddingStore(financeStore)
            .embeddingModel(embeddingModel)
            .maxResults(3)
            .minScore(0.7)
            .displayName("financeRetriever")
            .build();

    Map<ContentRetriever, String> retrieverToDescription = new HashMap<>();
    retrieverToDescription.put(
        hrRetriever,
        "Provides information on leave policies, benefits, onboarding, and other HR-related topics.");
    retrieverToDescription.put(
        itRetriever,
        "Handles technical queries such as VPN access, email issues, password resets, and software installations.");
    retrieverToDescription.put(
        financeRetriever,
        "Answers questions related to reimbursements, payslips, taxation, invoices, and financial approvals.");

    // Create LLM-based router
    LanguageModelQueryRouter router =
        LanguageModelQueryRouter.builder()
            .chatModel(chatModel)
            .retrieverToDescription(retrieverToDescription)
            .build();

    // Expand queries (optional step for better retrieval)
    ExpandingQueryTransformer expandingQueryTransformer =
        new ExpandingQueryTransformer(chatModel, 3);

    // Create retrieval augmentor with dynamic contentRetriever via router
    RetrievalAugmentor retrievalAugmentor =
        DefaultRetrievalAugmentor.builder()
            .queryTransformer(expandingQueryTransformer)
            .queryRouter(router)
            .contentAggregator(new DefaultContentAggregator())
            .build();

    // Build the assistant
    ChatAssistant chatAssistant =
        AiServices.builder(ChatAssistant.class)
            .chatModel(chatModel)
            .retrievalAugmentor(retrievalAugmentor)
            .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
            .build();

    // Run a conversation
    List<String> userQueries =
        List.of(
            "How do I reset my email password?",
            "Where can I see my payslip for last month?",
            "What is the company policy on medical leave?",
            "How to claim business travel reimbursement?",
            "I'm facing issues connecting to the VPN.");

    for (String userQuery : userQueries) {
      System.out.println("User: " + userQuery);
      String answer = chatAssistant.chat(userQuery);
      System.out.println("Assistant: " + answer);
      System.out.println("-----------------------------------------------");
    }
  }

  private static List<Document> convertToDocuments(List<String> texts) {
    List<Document> documents = new ArrayList<>();
    for (String text : texts) {
      documents.add(Document.from(text));
    }
    return documents;
  }
}

Output

User: How do I reset my email password?
Assistant: To reset your email password, follow these steps:

1. Go to the IT Helpdesk portal.
2. Log in with your company credentials.
3. Click on the "Password Reset" or "Forgot Password" option.
4. Follow the instructions provided to complete the password reset process.

Please note that you may need to provide additional verification information due to two-factor authentication requirements for accessing company email remotely.
-----------------------------------------------
User: Where can I see my payslip for last month?
Assistant: To view your payslip for last month, follow these steps:

1. Log in to the finance dashboard.
2. Check if a new month's payslips have been generated (typically around the 5th of the current month).
3. If available, click on the "View" or "Last Month's Pay" option to access your payslip for the previous month.

Please note that you should be able to view this information online through the finance dashboard, without needing any additional assistance or credentials.
-----------------------------------------------
User: What is the company policy on medical leave?
Assistant: Unfortunately, this information doesn't directly state the company's policy on medical leave. However, it does mention that "Our HR policies include flexible work hours" and provides resources for managing leaves (the HR portal) and offers mental wellness programs and EAPs. It can be inferred that the company has some form of medical leave policy, but the specific details are not provided in this information.
-----------------------------------------------
User: How to claim business travel reimbursement?
Assistant: To claim business travel reimbursement, follow these steps:

1. Use the expense portal.
2. Log in with your company credentials.
3. Navigate to the "Business Travel" or "Expense Claims" section.
4. Submit your receipts and relevant documentation for reimbursement within 15 days of the trip.

Note that specific details about reimbursement procedures, such as what expenses are eligible or any additional requirements, are not mentioned in this information.
-----------------------------------------------
User: I'm facing issues connecting to the VPN.
Assistant: Try restarting your system to resolve the issue connecting to the VPN.

If the problem persists, you can consider reaching out to IT for further assistance or requesting help from the Helpdesk via the IT Helpdesk portal.

Note that two-factor authentication and software installation requests are not directly related to resolving VPN connection issues.
-----------------------------------------------

Previous Next Home

Programming for beginners

Monday, 22 December 2025

Content Aggregation in LangChain4j: From Multi-Retriever Results to Ranked Relevance

No comments:

Post a Comment