Friday, 12 September 2025

Customizing Content Retrieval in RAG with EmbeddingStoreContentRetriever for Smarter Responses

Retrieval-Augmented Generation (RAG) models enhance AI's capabilities by feeding responses from external knowledge sources. However, effective RAG systems depend heavily on how documents are retrieved. In this post, I demonstrate how to customize the retrieval process using EmbeddingStoreContentRetriever by fine-tuning parameters like maxResults and minScore, and seamlessly integrating it with a chat assistant powered by a language model.

Why to customize the Retriever?

When fetching documents for a query, we may want to

 

·      Limit the number of returned results (maxResults)

·      Filter out irrelevant results using a minimum similarity score (minScore)

·      Using custom embedding models

 

This allows the system to return only the most relevant knowledge for response generation, reducing noise and improving trustworthiness.

 

Example

ContentRetriever contentRetriever =
    EmbeddingStoreContentRetriever.builder()
        .embeddingStore(embeddingStore) // Your store (e.g., in-memory or persistent)
        .embeddingModel(embeddingModel) // BGE, OpenAI, etc.
        .maxResults(5)                  // Only return top 5 similar docs
        .minScore(0.75)                 // Filter out documents below this similarity score
        .build();

ChatAssistant assistant =
    AiServices.builder(ChatAssistant.class)
        .chatModel(chatModel)          // Your LLM (OpenAI, HuggingFace, etc.)
        .contentRetriever(contentRetriever)  // Inject customized retriever
        .build();

Find the below working Application.

 

ChatAssistant.java

 

package com.sample.app.assistants;

public interface ChatAssistant {
  String chat(String userMessage);
}

InMemoryRagWithContentRetriever.java

package com.sample.app;

import com.sample.app.assistants.ChatAssistant;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import java.util.ArrayList;
import java.util.List;

public class InMemoryRagWithContentRetriever {

  public static void main(String[] args) {

    // Initialize local embedding model
    EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel();

    // Load all the Documents
    String resourcesFolderPath = "/Users/Shared/llm_docs";
    System.out.println("Resources Folder Path is " + resourcesFolderPath);
    List<Document> documents = FileSystemDocumentLoader.loadDocuments(resourcesFolderPath);

    InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
    EmbeddingStoreIngestor.ingest(documents, embeddingStore);

    OllamaChatModel chatModel =
        OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2").build();

    ContentRetriever contentRetriever =
        EmbeddingStoreContentRetriever.builder()
            .embeddingStore(embeddingStore)
            .embeddingModel(embeddingModel)
            .maxResults(5)
            .minScore(0.75)
            .build();

    ChatAssistant assistant =
        AiServices.builder(ChatAssistant.class)
            .chatModel(chatModel)
            .contentRetriever(contentRetriever)
            .build();

    List<String> questionsToAsk = new ArrayList<>();
    questionsToAsk.add("What is the tag line of ChronoCore Industries?");

    long time1 = System.currentTimeMillis();
    for (String question : questionsToAsk) {
      String answer = assistant.chat(question);
      System.out.println("----------------------------------------------------");
      System.out.println("Q: " + question);
      System.out.println("A : " + answer);
      System.out.println("----------------------------------------------------\n");
    }
    long time2 = System.currentTimeMillis();

    System.out.println("Total time taken is " + (time2 - time1));
  }
}

Output

Resources Folder Path is /Users/Shared/llm_docs
----------------------------------------------------
Q: What is the tag line of ChronoCore Industries?
A : I couldn't find the tagline explicitly stated in the provided information. However, I can provide you with a possible tagline based on the company's mission and values:

"Preserving Yesterday. Shaping Tomorrow."

This tagline is consistent with the company's mission to "unlock the fabric of time itself — responsibly, ethically, and with profound respect for the continuum that binds reality."
----------------------------------------------------

Total time taken is 4012

  

Previous                                                    Next                                                    Home

No comments:

Post a Comment