In the world of AI-driven applications, visibility into what’s happening inside your language model interactions is important for debugging, monitoring, and optimizing performance. Langchain4j provides powerful observability hooks for this purpose, especially using ChatModelListener with supported ChatModel and StreamingChatModel implementations.
In this post, we’ll explore how you can use Langchain4j's observability features to gain insights into requests sent to the LLM, responses received, and any errors that occur during processing.
Why Observability Matters in LLM Applications
· Debugging unpredictable LLM behavior
· Monitoring model usage and performance
· Capturing token usage for billing insights
· Tracking errors systematically
Understanding the Events
Langchain4j captures events aligned with OpenTelemetry's Generative AI Semantic Conventions:
· Request Attributes: messages, model, temperature, top_p, max_tokens, tools, response_format, etc.
· Response Attributes: assistant_message, id, model, token_usage, finish_reason, etc.
· Error Attributes: Exception details, stack trace, model info, etc.
Introduction to ChatModelListener interface
The ChatModelListener interface is part of the LangChain4j framework and provides a way to observe and react to events that occur during interactions with a ChatModel. This is particularly useful for logging, debugging, monitoring, or tracing interactions with Language Models (LLMs).
ChatModelListener listener interface designed to hook into three key lifecycle events of a language model request:
· Before the request is sent to the LLM (onRequest)
· After the response is received from the LLM (onResponse)
· If an error occurs during processing (onError)
It provides access to rich context objects for each of these events, enabling detailed introspection and observability.
public interface ChatModelListener { default void onRequest(ChatModelRequestContext requestContext) {} default void onResponse(ChatModelResponseContext responseContext) {} default void onError(ChatModelErrorContext errorContext) {} }
onRequest(ChatModelRequestContext requestContext)
Called before the request is sent to the model. It lets you inspect the ChatRequest, including Messages, Model name and Parameters like temperature, top-k, max tokens, etc.
onResponse(ChatModelResponseContext responseContext)
Called after a successful response is received from the LLM. It allows you to access the generated AiMessage, Metadata (model used, finish reason, token usage), the original ChatRequest, and any attributes set earlier.
onError(ChatModelErrorContext errorContext)
Called when an exception occurs during model interaction. It provides access to the thrown Throwable, the original ChatRequest etc.,
Find the below working Application.
ChatModelListenerDemo.java
package com.sample.app.observability; import dev.langchain4j.data.message.UserMessage; import dev.langchain4j.model.chat.ChatModel; import dev.langchain4j.model.chat.listener.ChatModelErrorContext; import dev.langchain4j.model.chat.listener.ChatModelListener; import dev.langchain4j.model.chat.listener.ChatModelRequestContext; import dev.langchain4j.model.chat.listener.ChatModelResponseContext; import dev.langchain4j.model.chat.request.ChatRequest; import dev.langchain4j.model.chat.request.ChatRequestParameters; import dev.langchain4j.model.chat.response.ChatResponse; import dev.langchain4j.model.chat.response.ChatResponseMetadata; import dev.langchain4j.model.ollama.OllamaChatModel; import dev.langchain4j.model.output.TokenUsage; import java.util.List; public class ChatModelListenerDemo { private static ChatModelListener listener = createListener(); private static ChatModelListener createListener() { return new ChatModelListener() { @Override public void onRequest(ChatModelRequestContext requestContext) { System.out.println("\n--- LLM REQUEST ---"); ChatRequest request = requestContext.chatRequest(); ChatRequestParameters params = request.parameters(); System.out.println("Messages:"); request.messages().forEach(msg -> System.out.println(" " + msg)); System.out.println("Request Parameters:"); System.out.println(" Model: " + params.modelName()); System.out.println(" Temperature: " + params.temperature()); System.out.println(" Top P: " + params.topP()); System.out.println(" Top K: " + params.topK()); System.out.println(" Frequency Penalty: " + params.frequencyPenalty()); System.out.println(" Presence Penalty: " + params.presencePenalty()); System.out.println(" Max Output Tokens: " + params.maxOutputTokens()); System.out.println(" Stop Sequences: " + params.stopSequences()); System.out.println(" Tool Specifications: " + params.toolSpecifications()); System.out.println(" Tool Choice: " + params.toolChoice()); System.out.println(" Response Format: " + params.responseFormat()); System.out.println("Model Provider: " + requestContext.modelProvider()); requestContext.attributes().put("my-attribute", "my-value"); } @Override public void onResponse(ChatModelResponseContext responseContext) { System.out.println("\n--- LLM RESPONSE ---"); ChatResponse response = responseContext.chatResponse(); ChatResponseMetadata metadata = response.metadata(); TokenUsage tokenUsage = metadata.tokenUsage(); System.out.println("Assistant Message: " + response.aiMessage()); System.out.println("Response Metadata:"); System.out.println(" ID: " + metadata.id()); System.out.println(" Model: " + metadata.modelName()); System.out.println(" Finish Reason: " + metadata.finishReason()); System.out.println("Token Usage:"); System.out.println(" Input Tokens: " + tokenUsage.inputTokenCount()); System.out.println(" Output Tokens: " + tokenUsage.outputTokenCount()); System.out.println(" Total Tokens: " + tokenUsage.totalTokenCount()); System.out.println("Original Request: " + responseContext.chatRequest()); System.out.println("Model Provider: " + responseContext.modelProvider()); System.out.println("Custom Attributes:"); System.out.println(" my-attribute: " + responseContext.attributes().get("my-attribute")); } @Override public void onError(ChatModelErrorContext errorContext) { System.out.println("\n--- LLM ERROR ---"); System.out.println("Exception:"); errorContext.error().printStackTrace(System.out); System.out.println("Related Request: " + errorContext.chatRequest()); System.out.println("Model Provider: " + errorContext.modelProvider()); System.out.println("Custom Attributes:"); System.out.println(" my-attribute: " + errorContext.attributes().get("my-attribute")); } }; } public static void main(String[] args) { // Create the Ollama model with listener ChatModel chatModel = OllamaChatModel.builder() .baseUrl("http://localhost:11434") .modelName("llama3.2") .listeners(List.of(listener)) .build(); // Build the request ChatRequest chatRequest = ChatRequest.builder() .messages( List.of( UserMessage.from( "What are the benefits of using AI in education, just explain in 2 lines?"))) .temperature(0.7) .maxOutputTokens(200) .build(); // Get response ChatResponse response = chatModel.chat(chatRequest); // Print final AI output System.out.println("\nFinal AI Response: " + response.aiMessage()); } }
Output
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
--- LLM REQUEST ---
Messages:
UserMessage { name = null contents = [TextContent { text = "What are the benefits of using AI in education, just explain in 2 lines?" }] }
Request Parameters:
Model: llama3.2
Temperature: 0.7
Top P: null
Top K: null
Frequency Penalty: null
Presence Penalty: null
Max Output Tokens: 200
Stop Sequences: []
Tool Specifications: []
Tool Choice: null
Response Format: null
Model Provider: OLLAMA
--- LLM RESPONSE ---
Assistant Message: AiMessage { text = "The use of AI in education offers several benefits, including personalized learning experiences, improved student engagement, and enhanced academic performance. Additionally, AI can automate administrative tasks, freeing up educators to focus on more hands-on teaching and mentoring roles." toolExecutionRequests = [] }
Response Metadata:
ID: null
Model: llama3.2
Finish Reason: STOP
Token Usage:
Input Tokens: 42
Output Tokens: 47
Total Tokens: 89
Original Request: ChatRequest { messages = [UserMessage { name = null contents = [TextContent { text = "What are the benefits of using AI in education, just explain in 2 lines?" }] }], parameters = OllamaChatRequestParameters{modelName="llama3.2", temperature=0.7, topP=null, topK=null, frequencyPenalty=null, presencePenalty=null, maxOutputTokens=200, stopSequences=[], toolSpecifications=[], toolChoice=null, responseFormat=null, mirostat=null, mirostatEta=null, mirostatTau=null, numCtx=null, repeatLastN=null, repeatPenalty=null, seed=null, minP=null, keepAlive=null} }
Model Provider: OLLAMA
Custom Attributes:
my-attribute: my-value
Final AI Response: AiMessage { text = "The use of AI in education offers several benefits, including personalized learning experiences, improved student engagement, and enhanced academic performance. Additionally, AI can automate administrative tasks, freeing up educators to focus on more hands-on teaching and mentoring roles." toolExecutionRequests = [] }
No comments:
Post a Comment