When building conversational AI applications, maintaining conversation history is essential for creating intelligent, context-aware responses. However, managing chat messages manually, especially across turns, sessions, or tools can quickly become error-prone and unmanageable. LangChain4j addresses this challenge through its ChatMemory abstraction.
This post explores how ChatMemory simplifies memory management, enhances control, and supports advanced features like eviction policies and persistence for scalable AI chat solutions.
1. What is ChatMemory?
ChatMemory is a core component in LangChain4j designed to store and manage ChatMessages. Under the hood, it typically uses a List to hold the messages, but adds significant enhancements to simplify developer experience and scalability.
You can use ChatMemory:
· As a standalone utility, managing the state in a low-level pipeline.
· Integrated with high level AI Services for seamless end-to-end AI-driven applications.
2. Key Features of ChatMemory
· Eviction Policy: Automatically remove older or less relevant messages when memory limits are reached. This helps to keep memory usage bounded, especially when working with large models or token-sensitive environments.
· Persistence: Store memory to disk, a database, or external storage, enabling long-running conversations that span sessions, devices, or users. This is crucial for applications needing state continuity, like customer support bots or virtual assistants.
· Special Treatment of System Messages: System messages (like initial instructions or role definitions) can be handled differently, preserved regardless of memory eviction policies, or injected at specific places in the prompt.
· Tool Message Handling: Messages related to tool invocations and their responses can be treated specially, enabling LangChain4j to keep track of reasoning steps or actions performed through tools like APIs or calculators.
3. Memory Vs History in LangChain4j
When building conversational AI applications, it’s tempting to think of "memory" and "history" as interchangeable or same. However, in the world of Large Language Models (LLMs) and frameworks like LangChain4j, memory and history serve very different purposes.
History is like a complete chat log, it’s everything you and the AI have said, exactly as it happened. It’s like reading a full transcript of your conversation.
Memory is more like the AI’s notes about the chat. Instead of keeping every word, the AI might:
· Shorten long messages.
· Forget less important parts.
· Summarize multiple messages into key points.
· Add extra info (like facts or reminders) to help it respond better.
So, History = the full conversation, while Memory = the AI’s smart, short version to help it "remember" what matters most.
LangChain4j currently offers only "memory", not "history". If you need to keep an entire history, handle it manually (for ex: store the complete history in your database).
4. Eviction Policy
An eviction policy determines which messages should be removed (evicted) from the memory when space is constrained, whether due to token limits, cost concerns, or latency constraints.
LangChain4j’s ChatMemory is not infinite. It needs to decide what to keep and what to discard intelligently to stay efficient and within operational bounds.
Why Eviction Policy is important?
· Fit Within the LLM’s Context Window: Large Language Models (LLMs) can only process a limited number of tokens at once (e.g., 4,000 or 16,000 tokens depending on the model). If the accumulated memory exceeds this context limit, some messages must be removed before the request is sent to the model.
· Control Cost: LLM APIs usually charge per token (both input and output). Retaining and sending long conversation histories increases costs significantly.
· Control Latency: The more tokens you send, the longer the model takes to generate a response. High latency can degrade user experience, especially in real-time applications like customer support bots or assistants.
Eviction Strategies in LangChain4j
LangChain4j offers two built-in eviction strategies through memory implementations:
· MessageWindowChatMemory: Retains the last N messages, discarding older ones beyond that limit. It is best for quick experiments, prototypes, and predictable message structures.
This strategy is not suitable for fine-tuned control over token usage.
· TokenWindowChatMemory: Keeps messages as long as their total token count stays within a specified limit. When a new message causes overflow, entire messages are evicted, oldest first. It provides better control over cost and latency.
For example, imagine you're building a travel assistant chatbot. A user starts a long session asking about flights, hotels, weather, etc. You can use TokenWindowChatMemory to:
· Ensure the assistant retains only the most contextually important and recent information.
· Stay within the 8,000 token budget for your model.
· Avoid overflowing and hitting token errors.
Let’s build an application.
SimpleTokenCountEstimator.java
package com.sample.app.chatmodels; import java.util.List; import dev.langchain4j.data.message.AiMessage; import dev.langchain4j.data.message.ChatMessage; import dev.langchain4j.data.message.Content; import dev.langchain4j.data.message.SystemMessage; import dev.langchain4j.data.message.TextContent; import dev.langchain4j.data.message.ToolExecutionResultMessage; import dev.langchain4j.data.message.UserMessage; import dev.langchain4j.model.TokenCountEstimator; public class SimpleTokenCountEstimator implements TokenCountEstimator { private static final double TOKENS_PER_WORD = 1.33; @Override public int estimateTokenCountInText(String text) { if (text == null || text.isBlank()) { return 0; } int wordCount = text.trim().split("\\s+").length; return (int) Math.ceil(wordCount * TOKENS_PER_WORD); } @Override public int estimateTokenCountInMessage(ChatMessage message) { String text = extractText(message); return estimateTokenCountInText(text); } @Override public int estimateTokenCountInMessages(Iterable<ChatMessage> messages) { int total = 0; for (ChatMessage message : messages) { total += estimateTokenCountInMessage(message); } return total; } private String extractText(ChatMessage message) { if (message instanceof UserMessage) { UserMessage userMessage = (UserMessage) message; List<Content> contents = userMessage.contents(); StringBuilder builder = new StringBuilder(); for (Content content : contents) { TextContent textContent = (TextContent) content; builder.append(textContent.text()); } return builder.toString(); } else if (message instanceof AiMessage) { return ((AiMessage) message).text(); } else if (message instanceof SystemMessage) { return ((SystemMessage) message).text(); } else if (message instanceof ToolExecutionResultMessage) { return ((ToolExecutionResultMessage) message).text(); } else { return ""; // fallback for unknown or null message } } }
Let's define ChatMemory object using SimpleTokenCountEstimator.
ChatMemory memory = TokenWindowChatMemory.builder().maxTokens(1000, estimator).build();
We can use ChatMemory object to handle the messages history properly using SimpleTokenCountEstimator.
UserMessage userMessage = UserMessage.from(input); memory.add(userMessage); AiMessage aiMessage = chatModel.chat(memory.messages()).aiMessage();
ChatMemoryDemo.java
package com.sample.app.chatmodels; import java.util.Scanner; import dev.langchain4j.data.message.AiMessage; import dev.langchain4j.data.message.UserMessage; import dev.langchain4j.memory.ChatMemory; import dev.langchain4j.memory.chat.TokenWindowChatMemory; import dev.langchain4j.model.TokenCountEstimator; import dev.langchain4j.model.chat.ChatModel; import dev.langchain4j.model.ollama.OllamaChatModel; public class ChatMemoryDemo { public static void main(String[] args) { // Build the Ollama language model ChatModel chatModel = OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2").build(); // Use the model itself as the token count estimator TokenCountEstimator estimator = new SimpleTokenCountEstimator(); // Create chat memory with token-based eviction policy ChatMemory memory = TokenWindowChatMemory.builder().maxTokens(1000, estimator).build(); Scanner scanner = new Scanner(System.in); System.out.println("Chat with AI. Type 'exit' to quit."); while (true) { System.out.print("\nYou: "); String input = scanner.nextLine(); if ("exit".equalsIgnoreCase(input.trim())) { System.out.println("Conversation ended."); break; } UserMessage userMessage = UserMessage.from(input); memory.add(userMessage); AiMessage aiMessage = chatModel.chat(memory.messages()).aiMessage(); memory.add(aiMessage); System.out.println("AI: " + aiMessage.text()); } scanner.close(); } }
Output
You: Hi, AI: How can I assist you today? You: Who are you AI: I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI." I was created by Meta AI to process and generate human-like text. My primary function is to understand and respond to questions, provide information, and engage in conversation to the best of my abilities. I'm a large language model, which means I've been trained on a massive dataset of text from various sources, including but not limited to books, articles, research papers, and online conversations. This training allows me to generate human-like responses to a wide range of topics and questions. I don't have a personal identity or emotions like humans do, but I'm designed to be helpful, informative, and engaging. My goal is to assist users like you with information, answers, and conversation! You:
Previous Next Home
No comments:
Post a Comment