Programming for beginners: Enhancing RAG Retrievals in Langchain4j with Query Transformers: Techniques & Best Practices

When building Retrieval-Augmented Generation (RAG) systems, the quality of retrieved results heavily depends on how the input query is framed. Users may submit vague, ambiguous, or verbose queries, which can degrade retrieval performance. This is where Query Transformers in Langchain4j come into play, they help to transform and optimize queries to improve the relevance of retrieved documents before generating a response.

In this post, we’ll explore what Query Transformers are in Langchain4j, why they matter, and dive into various strategies such as query compression, expansion, rewriting, step-back prompting, and hypothetical document embeddings (HyDE).

1. Introduction to RAG

RAG (Retrieval-Augmented Generation) is a method that helps large language models (LLMs) to get better answers by first finding useful information from your data and adding it to the prompt before sending it to the model. This helps the model respond more accurately and reduces the chances of it making things up (called hallucinations).

There are different ways to find this useful information:

· Full-text (keyword) search: Looks for documents that match the words in your question. It uses methods like TF-IDF or BM25 to rank how well each document matches.

· Vector (semantic) search: Converts documents and questions into numbers (vectors) and compares them to find the most similar meanings, not just matching words.

· Hybrid search: Combines both full-text and vector search to get better results.

Right now, this guide mainly focuses on vector search.

2. RAG Stages

The Retrieval-Augmented Generation (RAG) process usually has two main steps: Indexing and Retrieval. LangChain4j provides helpful tools for both.

2.1 Indexing Stage

Indexing is the first step where documents are prepared so they can be quickly and accurately searched later. For vector-based search, this step includes:

· Cleaning the content to remove unnecessary or irrelevant parts

· Adding extra context or metadata if needed

· Splitting large documents into smaller parts (called chunks)

· Converting those chunks into numerical vectors using embeddings

· Saving them in a special database called an embedding store or vector database

This step is usually done in the background, not while users are interacting with the app. For example, a scheduled job (like a weekly cron job) might update the data during non peak hours, such as weekends. Sometimes, a separate service takes care of managing the vector database.

In other cases, like when users upload their own documents, the indexing needs to happen right away (in real time) and be built into the app.

For this proof of concept (POC), documents related to the fictional company called ChronoCore Industries are stored in my computer and loaded into memory. This simple setup works well for the demo.

Here is a simplified diagram of the indexing stage.

2.2 Retrieval Stage

The retrieval stage begins when a user submits a question. The system then searches through pre-indexed documents to identify the most relevant information.

For vector search, the user's query is converted into an embedding, a numerical representation of the query. The system compares this embedding to others in the embedding store to find the most similar document segments. These relevant chunks are then included in the prompt provided to the language model to help generate an accurate response.

2. Why to Transform User Queries?

In traditional RAG (Retrieval-Augmented Generation) systems, documents are typically split into chunks, embedded using a language model, and later retrieved based on their semantic similarity to a user’s query. While this naive RAG approach works in many cases, it has notable limitations that can make it difficult both the quality and relevance of retrieved information.

Key challenges include

· Irrelevant Chunk Content: The chunking process often slices documents without understanding semantic boundaries. As a result, a single chunk might contain useful information along with unrelated or distracting content, reducing retrieval accuracy.

· Poorly Formulated User Queries: Users may express their questions in vague, incomplete, or overly specific ways that don’t align well with the format or content of stored embeddings, leading to suboptimal matches.

· Lack of Structured Query Support: In more advanced use cases, retrieval isn't just about matching free-text queries. It may involve generating structured queries, such as filtering vector search results based on metadata (e.g., date, author) or even translating natural language into SQL for querying relational databases.

To overcome these limitations, a new class of techniques focuses on transforming user queries before they are sent to the retriever. These techniques improve retrieval quality by refining the intent, scope, or format of the question.

In this post, we’ll explore some of the most effective query transformation strategies, including query rewriting, expansion, compression, and more advanced techniques like step-back prompting and hypothetical document embeddings (HyDE).

2.1 Query Rewriting

In Retrieval-Augmented Generation (RAG), a system tries to answer your question by first retrieving information (from documents or a database) and then reading it using an AI model like ChatGPT. In regular flow, the system just uses your question as-is to search for information.

But here's the problem: Your original question might not be the best way to search for the right answer.

For example, imagine you asked, "What was the impact of the 2008 financial crisis on small businesses in rural areas?"

That's a complex question. If we use it directly to search, we might not get the best results. We might get the documents that talk about:

· "effects on small businesses"

· "rural economic impact"

· or just "2008 financial crisis"

So, what do we do?

We rewrite the question into simpler or more useful versions before searching.

For example:

· Impact of 2008 recession on rural small businesses

· 2008 financial crisis effect on local economies

· How small businesses were affected by the 2008 downturn

This rewritten query is more likely to match more appropriate documents.

The "Rewrite-Retrieve-Read" Framework

Here’s how the new method works step-by-step:

· Rewrite the question: make it better for searching

· Retrieve documents using the rewritten version

· Read the documents + original question to give an answer

In summary, Query rewriting means improving the user’s question before searching, so the AI can find better, more accurate information. It’s like turning a confusing search into a clearer one, and making it easier for the AI to help you.

Sample prompt template for this

References: https://arxiv.org/pdf/2305.14283

2.2 Step Back Prompting

It's a technique used to help Large Language Models (LLMs) like GPT-4 think bigger/better when solving difficult problems, especially ones that require reasoning or connecting multiple ideas.

It’s inspired by how humans solve tough problems. When we don’t know the answer right away, we step back, think about the bigger picture or the general rules, and then apply them to the specific question.

Core Idea: Two Steps

· Step Back -> Think Big

· Then Reason -> Apply to Solve

Step 1: Step Back

Instead of jumping into the answer right away, the model asks a bigger, higher-level question that helps it think better.

Examples 1:

Original question: How does pressure change with temperature and volume?

Step-back question: What physics principle explains this?

Answer: Ideal Gas Law (PV = nRT)

Example 2:

Original question: Where did Estella Leopold study in 1940s?

Step-back question: What is Estella Leopold’s education history?

This gives a broader timeline that can help answer the specific time.

Goal of this step is to help the model to gather the right facts or rules before trying to answer.

Step 2: Then Reason

Now that you have the high-level documents or principles, you use them to answer the original question.

In this phase, you essentially combine the original question with all the documents or knowledge retrieved during the Step-Back (Abstraction) phase and then ask the model to reason and answer based on that combined context.

Sample prompt template for this

You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

{normal_context}
{step_back_context}

Original Question: {question}
Answer:

Credit goes to https://smith.langchain.com/hub/langchain-ai/stepback-answer?ref=blog.langchain.dev&__hstc=260096658.3a644e0328e94fa87e82130761cb66b8.1749469580827.1749469580827.1749469580827.1&__hssc=260096658.2.1749469580827&__hsfp=601531394&_gl=1*1gvdm1g*_gcl_au*MTU1MjgzNDUzMi4xNzQ5NDY5NTcx*_ga*MTU5ODE1NTc5MC4xNzQ5NDY5NTcz*_ga_47WX3HKKY2*czE3NDk0Njk1NzMkbzEkZzEkdDE3NDk0NzMwMDMkajYwJGwwJGgw

References: https://arxiv.org/pdf/2310.06117

2.3 Follow Up Questions

Let’s say you’re chatting with an assistant.

User: Where should I go in Japan in April?

Assistant: You can visit Kyoto for cherry blossoms, Tokyo for city life, and Hokkaido for snow festivals.

Follow-up Question: What’s the weather like?

Problem Without Query Transformation

If the system only takes “What’s the weather like?” as the search query, it has no idea you’re talking about Japan in April. So it might give you weather for another country or a random time.

Solution Using Query Transformation

Instead of embedding just the follow-up question, we use an LLM to rewrite it using the chat history.

Sample Prompt Template

Given the following conversation and a follow up question, rephrase the follow up \
question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone Question:

Credit goes to https://smith.langchain.com/hub/langchain-ai/weblangchain-search-query?ref=blog.langchain.dev&__hstc=260096658.3a644e0328e94fa87e82130761cb66b8.1749469580827.1749469580827.1749469580827.1&__hssc=260096658.2.1749469580827&__hsfp=601531394&_gl=1*oyvanp*_gcl_au*MTU1MjgzNDUzMi4xNzQ5NDY5NTcx*_ga*MTU5ODE1NTc5MC4xNzQ5NDY5NTcz*_ga_47WX3HKKY2*czE3NDk0Njk1NzMkbzEkZzEkdDE3NDk0NzQ2NzYkajYwJGwwJGgw

In summary,

· Original follow-up: “What’s the weather like?”

· After transformation: “What’s the weather like in Japan in April?”

· Benefit: More relevant results from search or retrieval

2.4 Multi-Query Retrieval Explained

Multi-Query Retrieval is a strategy where a Large Language Model (LLM) generates multiple related search queries from a single complex question. Instead of issuing just one search query, the system creates several focused queries that cover different aspects or sub-questions contained in the original question.

These multiple queries are then sent out in parallel to the search or retrieval system. The results from all these queries are collected together and used to provide a more comprehensive and accurate answer.

Why is this useful?

Some questions are complex and consist of multiple parts or layers. Searching with just one query might miss the important details or return incomplete results. By breaking the question down into multiple smaller queries, the system can cover more ground and bring back richer information.

For example, imagine a user asks: "What are the health benefits of green tea, and how does it compare to black tea in terms of caffeine content and antioxidant levels?"

This question has three distinct parts:

· What are the health benefits of green tea?

· How much caffeine does green tea have compared to black tea?

· How do antioxidant levels compare between green tea and black tea?

Without Multi-Query Retrieval

If you searched with the entire question as a single query, the results might only partially cover these topics or emphasize one part more than the others.

With Multi-Query Retrieval

An LLM can be prompted to generate multiple search queries such as:

· "Health benefits of green tea"

· "Caffeine content in green tea vs black tea"

· "Comparison of antioxidant levels in green tea and black tea"

These queries are then sent to the search system simultaneously. The results from all three are gathered, allowing the system to synthesize a complete, well-rounded answer that addresses each aspect thoroughly.

As you've seen, there are various techniques for performing query transformation, each with its own approach and benefits. While query transformation itself is not a new concept, what is new and powerful is the ability to use Large Language Models (LLMs) to carry it out.

The key difference between these methods often lies in the prompt engineering, that is, how we frame our instructions to the LLM. Writing these prompts is easier than you might think, if you can imagine the kind of question, you want the model to ask or change, you can probably write the prompt for it too.

This gives you a lot of new and exciting opportunities, now you have the power to come up with your own prompt ideas. So the real question is:

· What kinds of query transformations will you create?

· How will you use this ability in your own projects?

3. QueryTransformer interface in Langchain4j

QueryTransformer is an interface that takes a single query and returns one or more transformed queries. These new queries are often better suited for fetching relevant results from your knowledge base or embedding store.

Here’s the official definition in simple terms: QueryTransformer transforms the original user query into one or more enhanced queries to improve the quality of document retrieval.

3.1 Built-in QueryTransformer Implementations in LangChain4j

LangChain4j offers three powerful implementations of the QueryTransformer interface, each one has its own different needs in enhancing retrieval quality.

3.1.1 DefaultQueryTransformer: This is the most straightforward implementation. It simply returns the original query as-is, without making any changes or enhancements.

When to use it?

· You're just getting started and want a baseline retrieval setup.

· You don’t need any kind of transformation logic.

· Your queries are already well-structured and specific.

3.2.2. CompressingQueryTransformer

This transformer uses a ChatModel (like a language model) to compress or shorten the original query, often by removing irrelevant details or summarizing it. It also takes into account the chat memory (past interactions) to make the transformation more context-aware.

When to use it:

· The original query contains too much detail or redundant information.

· You want to refine the query to focus on what really matters.

· You’re working in a conversational context where prior interactions should influence the current query.

Prompt used by this transformer

"""
Generate {{n}} different versions of a provided user query. \
Each version should be worded differently, using synonyms or alternative sentence structures, \
but they should all retain the original meaning. \
These versions will be used to retrieve relevant documents. \
It is very important to provide each query version on a separate line, \
without enumerations, hyphens, or any additional formatting! \
User query: {{query}}
"""

CompressingQueryTransformerDemo.java

package com.sample.app;

import java.util.Collection;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.rag.query.Metadata;
import dev.langchain4j.rag.query.Query;
import dev.langchain4j.rag.query.transformer.CompressingQueryTransformer;

/**
 * Demonstrates the use of {@link CompressingQueryTransformer} in Langchain4j
 * with an Ollama LLM to compress verbose user queries based on prior
 * conversation context.
 *
 * <p>
 * This example sets up:
 * <ul>
 * <li>A basic Ollama LLM chat model using llama3.2</li>
 * <li>A short chat history (ChatMemory)</li>
 * <li>A verbose user query with historical context</li>
 * <li>A query transformation step to compress the query</li>
 * </ul>
 *
 * <p>
 * Use case: Helps improve retrieval in RAG by compressing verbose or redundant
 * queries.
 */
public class CompressingQueryTransformerDemo {

    public static void main(String[] args) {

        // Step 1: Initialize the chat model (Ollama running locally)
        OllamaChatModel chatModel = OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2")
                .build();

        // Step 2: Initialize chat memory with a windowed message history (up to 5
        // messages)
        ChatMemory chatMemory = MessageWindowChatMemory.builder().id("12345").maxMessages(5).build();

        // Add previous user and AI messages to simulate conversation context
        chatMemory.add(UserMessage.from("I'm researching electric vehicles."));
        chatMemory.add(AiMessage.from("Sure! What aspect of EVs are you interested in?"));

        // Step 3: Create the verbose user query
        String userQuery = "Can you help me find battery performance comparisons for EVs released after 2022?";

        // Attach metadata including user message and chat history
        Metadata metadata = Metadata.from(UserMessage.from(userQuery), 1, chatMemory.messages());
        Query originalQuery = Query.from(userQuery, metadata);

        // Step 4: Initialize the query transformer with the chat model
        CompressingQueryTransformer transformer = new CompressingQueryTransformer(chatModel);

        // Step 5: Transform the query using prior chat context
        Collection<Query> compressedQueries = transformer.transform(originalQuery);

        // Step 6: Output the compressed query (may be 1 or more if multi-query support
        // is enabled)
        compressedQueries.forEach(compressedQuery -> System.out.println("Compressed Query: " + compressedQuery.text()));
    }
}

Output

Compressed Query: Battery performance comparisons for electric vehicles (EVs) released after 2022.

3.2.3 ExpandingQueryTransformer

In contrast to compression, this transformer expands the query. It uses a ChatModel to enrich the original query. For example, by adding related terms, inferred context, or clarifying details, making it more likely to retrieve relevant documents.

When to use it:

· The original query is too vague or under-specified.

· You want to improve by making the query broader and more inclusive.

· You're dealing with open-ended or exploratory queries.

Prompt used by this transformer

"""
Read and understand the conversation between the User and the AI. \
Then, analyze the new query from the User. \
Identify all relevant details, terms, and context from both the conversation and the new query. \
Reformulate this query into a clear, concise, and self-contained format suitable for information retrieval.

Conversation:
{{chatMemory}}

User query: {{query}}

It is very important that you provide only reformulated query and nothing else! \
Do not prepend a query with anything!
"""

Find the below working application.

ExpandingQueryTransformerDemo.java

package com.sample.app;

import java.util.Collection;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.rag.query.Metadata;
import dev.langchain4j.rag.query.Query;
import dev.langchain4j.rag.query.transformer.ExpandingQueryTransformer;

public class ExpandingQueryTransformerDemo {

    public static final int EXPECTED_QUERIES = 5;

    public static void main(String[] args) {

        // Step 1: Initialize the chat model (Ollama running locally)
        OllamaChatModel chatModel = OllamaChatModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("llama3.2")
                .build();

        // Step 2: Initialize chat memory with a windowed message history (up to 5 messages)
        ChatMemory chatMemory = MessageWindowChatMemory.builder()
                .id("12345")
                .maxMessages(5)
                .build();

        // Add previous user and AI messages to simulate conversation context
        chatMemory.add(UserMessage.from("I'm researching electric vehicles."));
        chatMemory.add(AiMessage.from("Sure! What aspect of EVs are you interested in?"));

        // Step 3: Create the verbose user query
        String userQuery = "Can you help me find battery performance comparisons for EVs released after 2022?";

        // Attach metadata including user message and chat history
        Metadata metadata = Metadata.from(UserMessage.from(userQuery), 1, chatMemory.messages());
        Query originalQuery = Query.from(userQuery, metadata);

        // Step 4: Create ExpandingQueryTransformer with expected number of queries
        ExpandingQueryTransformer transformer = new ExpandingQueryTransformer(chatModel, EXPECTED_QUERIES);

        // Step 5: Transform the original query
        Collection<Query> expandedQueries = transformer.transform(originalQuery);

        // Step 6: Output the results
        System.out.println("Total Queries generated: " + expandedQueries.size());
        for (Query query : expandedQueries) {
            System.out.println(query.text());
        }
    }
}

Output

Total Queries generated: 5
I can help you find information on battery performance comparisons for electric vehicles released in the last few years
Can you provide more details about the type of comparison you're looking for regarding electric vehicle batteries?
Is there a particular aspect of battery performance you'd like to know more about, such as charging speed or range?
Can you give me an idea of what kind of information would be helpful to you when it comes to comparing different electric vehicle battery types?
Would you prefer to see comparisons between specific brands or models of electric vehicles?

Here’s how you can use ExpandingQueryTransformer inside a RAG pipeline by attaching it to a RetrievalAugmentor, which is then plugged into the ChatAssistant:

EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel();

ContentRetriever contentRetriever =
    EmbeddingStoreContentRetriever.builder()
        .embeddingStore(embeddingStore)
        .embeddingModel(embeddingModel)
        .maxResults(3)
        .minScore(0.7)
        .build();

ExpandingQueryTransformer expandingQueryTransformer =
    new ExpandingQueryTransformer(chatModel, 3);

RetrievalAugmentor retrievalAugmentor =
    DefaultRetrievalAugmentor.builder()
        .queryTransformer(expandingQueryTransformer)
        .contentRetriever(contentRetriever)
        .build();

ChatAssistant chatAssistant =
    AiServices.builder(ChatAssistant.class)
        .chatModel(chatModel)
        .retrievalAugmentor(retrievalAugmentor)
        .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
        .build();

How It Works?

· ExpandingQueryTransformer generates multiple reworded versions of the user’s query using the LLM.

· These versions are passed to the ContentRetriever, improving document recall.

· The RetrievalAugmentor ties this transformation + retrieval together.

· ChatAssistant uses this full pipeline for better RAG-based answering.

Find the below working application.

QueryTransformerDemo.java

package com.sample.app.transformer;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.onnx.bgesmallenv15q.BgeSmallEnV15QuantizedEmbeddingModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.rag.query.transformer.ExpandingQueryTransformer;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import java.util.ArrayList;
import java.util.List;

public class QueryTransformerDemo {

  public static interface ChatAssistant {
    String chat(String userMessage);
  }

  private static List<String> indiaDocs =
      List.of(
          """
                    India, officially the Republic of India, is a country in South Asia. It is the seventh-largest country by area,
                    the most populous country as of 2023, and the most populous democracy in the world. Bounded by the Himalayas in
                    the north and surrounded by the Indian Ocean, Arabian Sea, and Bay of Bengal, India has a rich cultural and
                    historical heritage.
                    """,
          """
                    The Indian economy is the fifth-largest in the world by nominal GDP and the third-largest by purchasing power parity.
                    It is classified as a newly industrialized country and one of the world's fastest-growing major economies. Major
                    industries include IT services, textiles, pharmaceuticals, and agriculture.
                    """,
          """
                    India's political system is a federal parliamentary democratic republic. The President of India is the head of state,
                    while the Prime Minister is the head of government. The Indian Parliament consists of the Lok Sabha (House of the People)
                    and the Rajya Sabha (Council of States).
                    """,
          """
                    Indian culture is renowned for its diversity and depth. It includes a wide variety of languages, religions, music,
                    dance forms, cuisine, and traditions. Major religions born in India include Hinduism, Buddhism, Jainism, and Sikhism.
                    Festivals such as Diwali, Eid, Christmas, and Pongal are celebrated across the country.
                    """,
          """
                    India is home to 40 UNESCO World Heritage Sites, including the Taj Mahal, Qutub Minar, and the Western Ghats. The country
                    offers a wide variety of landscapes ranging from deserts in Rajasthan, the Himalayan mountain range, tropical rainforests
                    in the northeast, to coastal plains and fertile river valleys.
                    """,
          """
                    The Indian education system has seen significant progress, with institutes like the IITs, IIMs, and AIIMS gaining global
                    recognition. India also has a growing startup ecosystem, particularly in tech hubs such as Bengaluru, Hyderabad, and Pune.
                    """,
          """
                    India has a vibrant film industry, notably Bollywood, which produces the largest number of films in the world. Indian music
                    and dance forms such as Bharatanatyam, Kathak, Carnatic music, and Bollywood songs are famous worldwide.
                    """,
          """
                    Indian cuisine is known for its rich flavors and diverse ingredients. Each region has its own distinct dishes –
                    from butter chicken and biryani in the north, to dosa and sambar in the south, to seafood delicacies in the coastal regions.
                    Spices like turmeric, cumin, coriander, and cardamom are integral to Indian cooking.
                    """,
          """
                    The Indian space program, led by ISRO, has made remarkable achievements including the Chandrayaan lunar missions and the
                    Mars Orbiter Mission (Mangalyaan), which made India the first Asian nation to reach Mars orbit and the first in the world
                    to do so in its maiden attempt.
                    """,
          """
                    Indian sports are dominated by cricket, but other sports like hockey, badminton, wrestling, and kabaddi are also popular.
                    India has produced world-class athletes including Sachin Tendulkar, P.V. Sindhu, Neeraj Chopra, and Mary Kom. The country
                    has hosted major sporting events like the Commonwealth Games and the Cricket World Cup.
                    """);

  private static List<Document> documents = new ArrayList<>();

  private static void prepareDocuments() {
    for (String doc : indiaDocs) {
      documents.add(Document.from(doc));
    }
  }

  public static void main(String[] args) {
    prepareDocuments();

    OllamaChatModel chatModel =
        OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2").build();

    InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
    EmbeddingStoreIngestor.ingest(documents, embeddingStore);

    ExpandingQueryTransformer expandingQueryTransformer =
        new ExpandingQueryTransformer(chatModel, 3);
    EmbeddingModel embeddingModel = new BgeSmallEnV15QuantizedEmbeddingModel();

    ContentRetriever contentRetriever =
        EmbeddingStoreContentRetriever.builder()
            .embeddingStore(embeddingStore)
            .embeddingModel(embeddingModel) // same chat model for embeddings
            .maxResults(3)
            .minScore(0.7)
            .build();

    RetrievalAugmentor retrievalAugmentor =
        DefaultRetrievalAugmentor.builder()
            .queryTransformer(expandingQueryTransformer)
            .contentRetriever(contentRetriever)
            .build();

    String prompt =
        """
                Hey, I was reading about how countries manage their political systems and got curious — can you tell me how India runs its government, who’s in charge, and what kind of structure they follow politically?

                """;

    ChatAssistant chatAssistant =
        AiServices.builder(ChatAssistant.class)
            .chatModel(chatModel)
            .retrievalAugmentor(retrievalAugmentor)
            .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
            .build();

    String response = chatAssistant.chat(prompt);
    System.out.println(response);
  }
}

Output

India's government structure is a federal parliamentary democratic republic. Here's an overview of who's in charge and the kind of structure they follow:

**Head of State:** The President of India serves as the head of state. They are elected by an electoral college consisting of the members of both houses of Parliament (Lok Sabha and Rajya Sabha) and the governors of all states.

**Head of Government:** The Prime Minister is the head of government, but they do not serve as a separate executive branch. Instead, they are responsible for advising the President on matters of national importance and forming a cabinet to implement policies.

**Government Structure:** India's federal system divides power between the central government and the states (called "units"). The central government has control over areas such as defense, foreign policy, and interstate relations, while the states have autonomy over local issues like healthcare, education, and law enforcement.

**Legislative Branch:** The Indian Parliament is bicameral, consisting of two houses:

1. **Lok Sabha (House of the People):** This is the lower house of Parliament, comprising 543 elected members who serve a term of five years.
2. **Rajya Sabha (Council of States):** This is the upper house of Parliament, with 245 members who are either directly or indirectly elected by the state legislatures.

The Lok Sabha has the power to pass laws and elect the President, while the Rajya Sabha provides a check on the Lok Sabha's decisions and can also initiate legislation.

References

https://blog.langchain.dev/query-transformations/

Previous Next Home

Programming for beginners

Tuesday, 16 September 2025

Enhancing RAG Retrievals in Langchain4j with Query Transformers: Techniques & Best Practices

No comments:

Post a Comment