Retrieval-Augmented Generation (RAG) applications rely heavily on the ability to pull relevant information from external data sources. In LangChain4j, this responsibility lies with the ContentRetriever interface. Whether your data lives in a vector store, SQL database, graph database, or even on the open web, LangChain4j provides a unified way to retrieve relevant chunks of information.
In this post, we’ll walk through the role of ContentRetriever, explore various implementations provided by LangChain4j, and understand how they help bridge the gap between unstructured queries and structured (or semi-structured) data sources.
1. What is Content Retriever?
A ContentRetriever retrieves Content from an underlying data source using a given Query.
public interface ContentRetriever { List<Content> retrieve(Query query); }
At the time of writing this pos, langchain4j support following content retrievers.
LangChain4j comes with several built-in content retrievers that help get information from different types of data sources.
· The EmbeddingStoreContentRetriever finds content using vector similarity, which is great for working with embeddings and doing smart searches.
· The WebSearchContentRetriever pulls in up-to-date information from the web using a search engine.
· The AzureAiSearchContentRetriever works with Azure AI Search and supports full-text search, vector search, a mix of both, and can re-rank results to show the most useful ones first, making it good for enterprise use.
· The SqlDatabaseContentRetriever (still experimental) connects to relational databases and turns plain language questions into SQL queries to get the right data.
· Lastly, the Neo4jContentRetriever lets you search graph databases by converting natural language into Cypher queries and fetching related entities and connections.
These retrievers allow LangChain4j to link AI models with many different types of data sources.
Benefits of LangChain4j's Content Retriever Abstraction
· Pluggable architecture: Easily switch between retrievers without rewriting business logic.
· Unified Query API: Regardless of the backend (vector DB, SQL, graph), all retrievers accept the same Query object.
· Composable with RAG pipelines: Easily integrate with LangChain4j’s other RAG components (query transformers, aggregators, etc.).
Example
ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .displayName("informaitonAboutIndia") .maxResults(3) .minScore(0.7) .build();
Above code creates an instance of ContentRetriever using the EmbeddingStoreContentRetriever. It configures how this retriever should fetch relevant content from an embedding-based data store.
Find the below working application.
ContentRetrieverDemo.java
package com.sample.app.contentretriever; import java.util.ArrayList; import java.util.List; import dev.langchain4j.data.document.Document; import dev.langchain4j.data.segment.TextSegment; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel; import dev.langchain4j.rag.content.Content; import dev.langchain4j.rag.content.retriever.ContentRetriever; import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever; import dev.langchain4j.rag.query.Query; import dev.langchain4j.store.embedding.EmbeddingStoreIngestor; import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore; public class ContentRetrieverDemo { private static List<String> indiaDocs = List.of( """ India, officially the Republic of India, is a country in South Asia. It is the seventh-largest country by area, the most populous country as of 2023, and the most populous democracy in the world. Bounded by the Himalayas in the north and surrounded by the Indian Ocean, Arabian Sea, and Bay of Bengal, India has a rich cultural and historical heritage. """, """ The Indian economy is the fifth-largest in the world by nominal GDP and the third-largest by purchasing power parity. It is classified as a newly industrialized country and one of the world's fastest-growing major economies. Major industries include IT services, textiles, pharmaceuticals, and agriculture. """, """ India's political system is a federal parliamentary democratic republic. The President of India is the head of state, while the Prime Minister is the head of government. The Indian Parliament consists of the Lok Sabha (House of the People) and the Rajya Sabha (Council of States). """, """ Indian culture is renowned for its diversity and depth. It includes a wide variety of languages, religions, music, dance forms, cuisine, and traditions. Major religions born in India include Hinduism, Buddhism, Jainism, and Sikhism. Festivals such as Diwali, Eid, Christmas, and Pongal are celebrated across the country. """, """ India is home to 40 UNESCO World Heritage Sites, including the Taj Mahal, Qutub Minar, and the Western Ghats. The country offers a wide variety of landscapes ranging from deserts in Rajasthan, the Himalayan mountain range, tropical rainforests in the northeast, to coastal plains and fertile river valleys. """, """ The Indian education system has seen significant progress, with institutes like the IITs, IIMs, and AIIMS gaining global recognition. India also has a growing startup ecosystem, particularly in tech hubs such as Bengaluru, Hyderabad, and Pune. """, """ India has a vibrant film industry, notably Bollywood, which produces the largest number of films in the world. Indian music and dance forms such as Bharatanatyam, Kathak, Carnatic music, and Bollywood songs are famous worldwide. """, """ Indian cuisine is known for its rich flavors and diverse ingredients. Each region has its own distinct dishes – from butter chicken and biryani in the north, to dosa and sambar in the south, to seafood delicacies in the coastal regions. Spices like turmeric, cumin, coriander, and cardamom are integral to Indian cooking. """, """ The Indian space program, led by ISRO, has made remarkable achievements including the Chandrayaan lunar missions and the Mars Orbiter Mission (Mangalyaan), which made India the first Asian nation to reach Mars orbit and the first in the world to do so in its maiden attempt. """, """ Indian sports are dominated by cricket, but other sports like hockey, badminton, wrestling, and kabaddi are also popular. India has produced world-class athletes including Sachin Tendulkar, P.V. Sindhu, Neeraj Chopra, and Mary Kom. The country has hosted major sporting events like the Commonwealth Games and the Cricket World Cup. """); private static List<Document> documents = new ArrayList<>(); private static void prepareDocuments() { for (String doc : indiaDocs) { documents.add(Document.from(doc)); } } public static void main(String[] args) { prepareDocuments(); InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); EmbeddingStoreIngestor.ingest(documents, embeddingStore); EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel(); ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder().embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .displayName("informaitonAboutIndia") .maxResults(3).minScore(0.7).build(); String userQuery = """ Hey, I was reading about how countries manage their political systems and got curious — can you tell me how India runs its government, who’s in charge, and what kind of structure they follow politically? """; List<Content> matchedContent = contentRetriever.retrieve(Query.from(userQuery)); System.out.println("Total Matched Documents : " + matchedContent.size() + "\n"); int i = 1; for (Content content : matchedContent) { System.out.println(i + ". " +content.textSegment().text() + "\n"); i++; } } }
Output
Total Matched Documents : 3 1. India's political system is a federal parliamentary democratic republic. The President of India is the head of state, while the Prime Minister is the head of government. The Indian Parliament consists of the Lok Sabha (House of the People) and the Rajya Sabha (Council of States). 2. India, officially the Republic of India, is a country in South Asia. It is the seventh-largest country by area, the most populous country as of 2023, and the most populous democracy in the world. Bounded by the Himalayas in the north and surrounded by the Indian Ocean, Arabian Sea, and Bay of Bengal, India has a rich cultural and historical heritage. 3. The Indian economy is the fifth-largest in the world by nominal GDP and the third-largest by purchasing power parity. It is classified as a newly industrialized country and one of the world's fastest-growing major economies. Major industries include IT services, textiles, pharmaceuticals, and agriculture.
ContentRetriever takes the input from Query router in RAG pipelines.
ContentRetrieverFullApp.java
package com.sample.app.contentretriever; import dev.langchain4j.data.document.Document; import dev.langchain4j.data.segment.TextSegment; import dev.langchain4j.memory.chat.MessageWindowChatMemory; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel; import dev.langchain4j.model.ollama.OllamaChatModel; import dev.langchain4j.rag.DefaultRetrievalAugmentor; import dev.langchain4j.rag.RetrievalAugmentor; import dev.langchain4j.rag.content.retriever.ContentRetriever; import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever; import dev.langchain4j.rag.query.router.LanguageModelQueryRouter; import dev.langchain4j.rag.query.transformer.ExpandingQueryTransformer; import dev.langchain4j.service.AiServices; import dev.langchain4j.store.embedding.EmbeddingStore; import dev.langchain4j.store.embedding.EmbeddingStoreIngestor; import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; public class ContentRetrieverFullApp { interface ChatAssistant { String chat(String userMessage); } public static void main(String[] args) { // Initialize LLM model for routing and chatting OllamaChatModel chatModel = OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2").build(); // Shared embedding model EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel(); // Create embedding stores EmbeddingStore<TextSegment> hrStore = new InMemoryEmbeddingStore<>(); EmbeddingStore<TextSegment> itStore = new InMemoryEmbeddingStore<>(); EmbeddingStore<TextSegment> financeStore = new InMemoryEmbeddingStore<>(); // Sample data for ingestion List<String> hrDocs = List.of( "Our HR policies include flexible work hours, medical leave, and onboarding support.", "You can check your leave balance and benefits on the HR portal.", "Employee handbooks and code of conduct documents are available on the intranet.", "Annual performance reviews are conducted in Q1 of each year.", "We provide mental wellness programs and employee assistance plans (EAP).", "New employees must complete their joining formalities within the first week.", "HR holds town halls every quarter to discuss policy changes and updates.", "Exit interviews are mandatory and help improve employee retention practices."); List<String> itDocs = List.of( "If you're facing issues with VPN, restart your system and try again.", "Password resets can be done via the IT Helpdesk portal.", "For software installation requests, raise a ticket through the ServiceNow portal.", "Two-factor authentication is mandatory for accessing company email remotely.", "Laptop issues should be reported to the IT asset management team.", "New joiners will receive their device credentials within 24 hours of onboarding.", "We recommend using the Chrome browser for all internal web tools.", "The weekly IT newsletter includes patch updates and known issues."); List<String> financeDocs = List.of( "Payslips are generated on the 5th of every month and available on the finance dashboard.", "You can file business travel reimbursements through the expense portal.", "Employees must submit receipts within 15 days for expense claims.", "Annual tax declarations must be uploaded to the HRMS by January 31.", "Salary revisions are processed in March and reflected in April pay.", "All invoice-related queries should be addressed to finance@company.com.", "The company reimburses professional certification exam fees up to ₹10,000.", "You can track investment proofs under the ‘My Tax’ section on the intranet."); // Ingest documents into respective stores EmbeddingStoreIngestor.ingest(convertToDocuments(hrDocs), hrStore); EmbeddingStoreIngestor.ingest(convertToDocuments(itDocs), itStore); EmbeddingStoreIngestor.ingest(convertToDocuments(financeDocs), financeStore); // Create content retrievers ContentRetriever hrRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(hrStore) .embeddingModel(embeddingModel) .maxResults(3) .minScore(0.7) .displayName("hrRetriever") .build(); ContentRetriever itRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(itStore) .embeddingModel(embeddingModel) .maxResults(3) .minScore(0.7) .displayName("itRetriever") .build(); ContentRetriever financeRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(financeStore) .embeddingModel(embeddingModel) .maxResults(3) .minScore(0.7) .displayName("financeRetriever") .build(); Map<ContentRetriever, String> retrieverToDescription = new HashMap<>(); retrieverToDescription.put( hrRetriever, "Provides information on leave policies, benefits, onboarding, and other HR-related topics."); retrieverToDescription.put( itRetriever, "Handles technical queries such as VPN access, email issues, password resets, and software installations."); retrieverToDescription.put( financeRetriever, "Answers questions related to reimbursements, payslips, taxation, invoices, and financial approvals."); // Create LLM-based router LanguageModelQueryRouter router = LanguageModelQueryRouter.builder() .chatModel(chatModel) .retrieverToDescription(retrieverToDescription) .build(); // Expand queries (optional step for better retrieval) ExpandingQueryTransformer expandingQueryTransformer = new ExpandingQueryTransformer(chatModel, 3); // Create retrieval augmentor with dynamic contentRetriever via router RetrievalAugmentor retrievalAugmentor = DefaultRetrievalAugmentor.builder() .queryTransformer(expandingQueryTransformer) .queryRouter(router) .build(); // Build the assistant ChatAssistant chatAssistant = AiServices.builder(ChatAssistant.class) .chatModel(chatModel) .retrievalAugmentor(retrievalAugmentor) .chatMemory(MessageWindowChatMemory.withMaxMessages(10)) .build(); // Run a conversation List<String> userQueries = List.of( "How do I reset my email password?", "Where can I see my payslip for last month?", "What is the company policy on medical leave?", "How to claim business travel reimbursement?", "I'm facing issues connecting to the VPN."); for (String userQuery : userQueries) { System.out.println("User: " + userQuery); String answer = chatAssistant.chat(userQuery); System.out.println("Assistant: " + answer); System.out.println("-----------------------------------------------"); } } private static List<Document> convertToDocuments(List<String> texts) { List<Document> documents = new ArrayList<>(); for (String text : texts) { documents.add(Document.from(text)); } return documents; } }
Output
User: How do I reset my email password? Assistant: Unfortunately, this information doesn't provide instructions on how to reset your email password specifically. However, I can infer that you can use the IT Helpdesk portal for assistance with resetting your email password. If you're having trouble accessing the portal or need help with the process, you may want to contact the IT department directly for support. ----------------------------------------------- User: Where can I see my payslip for last month? Assistant: According to the information provided, payslips for last month would be available on the finance dashboard, which is generated on the 5th of every month. This implies that if you're looking at your current month's payslip, it should be visible on the finance dashboard, but you'll need to wait until the 5th of the next month to see the previous month's payslip. ----------------------------------------------- User: What is the company policy on medical leave? Assistant: The company policy on medical leave is not explicitly stated, but it can be inferred that there is a provision for it as part of the "Our HR policies include flexible work hours, medical leave, and onboarding support" statement. However, the exact details of this policy are not provided in the given information. ----------------------------------------------- User: How to claim business travel reimbursement? Assistant: To claim business travel reimbursement, you need to use the expense portal. The exact steps or process for doing so are not provided in this information. However, it is mentioned that employees must submit receipts within 15 days for expense claims, suggesting that you will need to upload your receipts and follow any applicable procedures through the expense portal. ----------------------------------------------- User: I'm facing issues connecting to the VPN. Assistant: If you're facing issues connecting to the VPN, try restarting your system and then attempt to connect again. This may resolve the issue, but if not, further troubleshooting steps would be needed to identify and resolve the problem. -----------------------------------------------
Previous Next Home

 
 
No comments:
Post a Comment