Programming for beginners: How to Send Multimodal Messages to LLMs Using LangChain4j?

Developers integrating AI into their Java applications often need to send more than just plain text to large language models. LangChain4j simplifies this with a UserMessage API that supports multimodal inputs—text, images, audio, video, and even PDFs. Here's how you can leverage this feature and what it means for your AI-powered Java applications.

LangChain4j now makes it easy to build multimodal prompts for large language models. This means you can send not just text, but also images, audio, video, and PDF files in a single message to supported LLMs.

The UserMessage class supports a list of Content types. Content is a common interface implemented by the following:

· TextContent

· ImageContent

· AudioContent

· VideoContent

· PdfFileContent

Here’s an example of sending both text and an image in one message:

UserMessage userMessage = UserMessage.from(
	TextContent.from("Describe the following image"),
	ImageContent.from("https://raw.githubusercontent.com/yavuzceliker/sample-images/main/images/image-1.jpg"));

ChatResponse response = chatModel.chat(userMessage);

This is particularly powerful when working with LLMs that support multimodal input. Depending on the provider (like OpenAI, Google, Anthropic, etc.), certain content types may or may not be supported. You can consult the provider comparison table to check which modalities are supported by each model.

To demo this application, I am using "llama3.2-vision".

ollama run llama3.2-vision

DescribeImage.java

package com.sample.app.chatmodels;

import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.TextContent;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.ollama.OllamaChatModel;

public class DescribeImage {

	public static void main(String[] args) {
		// Create the Ollama language model
		ChatModel chatModel = OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2-vision")
				.build();

		UserMessage userMessage = UserMessage.from(TextContent.from("Describe the following image"),
				ImageContent.from("https://raw.githubusercontent.com/yavuzceliker/sample-images/main/images/image-1.jpg"));
		ChatResponse response = chatModel.chat(userMessage);

		// Print the AI response
		System.out.println("AI Response: " + response.aiMessage().text());
	}
}

Output

AI Response: The image depicts a woman leaping between two rocky outcroppings.  She is wearing light-colored, athletic shorts and a dark top; her hair is in a ponytail.  The rock on the left is taller than the one on the right, creating a chasm that she has just jumped over.  

In the background, there is water and hills, with a golden sky from sunset or sunrise.

Previous Next Home

Programming for beginners

Thursday, 26 June 2025

How to Send Multimodal Messages to LLMs Using LangChain4j?

No comments:

Post a Comment