Monday, 22 December 2025

Extracting Structured Java Objects from Unstructured Text Using LLMs and JSON Schema

In modern Java applications, integrating LLMs (Large Language Models) opens up powerful ways to extract insights from unstructured text. However, one major problem remains, how do we reliably convert these natural language outputs into well-typed Java objects?

In this post, we’ll explore how to guide LLMs to return structured JSON that maps directly to Java records or classes using JSON Schema.

 

Why Structured Output from LLMs Matters?

LLMs are great at generating human-like text, but applications often require structured, machine parseable formats like JSON. By using structured output capabilities of LLM providers, we can:

 

·      Avoid fragile regex or prompt parsing.

·      Automatically map outputs into Java objects.

·      Enhance validation and error handling via schema enforcement.

 

Defining Our Java Model

Let’s say we want to extract the following data from a paragraph of text:

 

public record Employee(String name, int age, double height, boolean married) {}

 

Now, here’s an example of the unstructured input text:

 

Alice is 30 years old and leads a vibrant lifestyle.

She is 1.65 meters tall and always brings energy into any room.

Though married, she balances her family life with an active professional career.

 

Our goal is to extract a valid Employee record from this.

 

Step-by-Step Integration Using JSON Schema

 

Step 1: Define the JSON Schema for the Output

The schema clearly defines the required structure for the LLM output:

JsonObjectSchema employeeSchema = JsonObjectSchema.builder()
    .addStringProperty("name")
    .addIntegerProperty("age")
    .addNumberProperty("height")
    .addBooleanProperty("married")
    .required("name", "age", "height", "married")
    .build();

 

Step 2: Build the Response Format

Using the schema, create a ResponseFormat that instructs the LLM to respond with JSON conforming to our Employee record:

ResponseFormat responseFormat = ResponseFormat.builder()
    .type(ResponseFormatType.JSON)
    .jsonSchema(JsonSchema.builder().name("Employee").rootElement(employeeSchema).build())
    .build();

 

Step 3: Provide the Input Message

UserMessage userMessage = UserMessage.from("""
    Alice is 30 years old and leads a vibrant lifestyle.
    She is 1.65 meters tall and always brings energy into any room.
    Though married, she balances her family life with an active professional career.
""");

 

Step 4: Construct and Send the Chat Request

ChatRequest chatRequest = ChatRequest.builder()
    .responseFormat(responseFormat)
    .messages(userMessage)
    .build();

ChatModel chatModel = OllamaChatModel.builder()
    .baseUrl("http://localhost:11434")
    .modelName("llama3.2")
    .build();

ChatResponse chatResponse = chatModel.chat(chatRequest);

Find the below working application.

 

Employee.java

package com.sample.app.dto;

public record Employee(String name, int age, double height, boolean married) {
}

 

JsonSchemaDemo.java

package com.sample.app.structured.responses;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.sample.app.dto.Employee;

import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.request.ResponseFormat;
import dev.langchain4j.model.chat.request.ResponseFormatType;
import dev.langchain4j.model.chat.request.json.JsonObjectSchema;
import dev.langchain4j.model.chat.request.json.JsonSchema;
import dev.langchain4j.model.chat.response.ChatResponse;
import dev.langchain4j.model.ollama.OllamaChatModel;

public class JsonSchemaDemo {

    public static void main(String[] args) throws JsonMappingException, JsonProcessingException {

        JsonObjectSchema employeeSchema = JsonObjectSchema.builder().addStringProperty("name").addIntegerProperty("age")
                .addNumberProperty("height").addBooleanProperty("married").required("name", "age", "height", "married")
                .build();

        ResponseFormat responseFormat = ResponseFormat.builder().type(ResponseFormatType.JSON)
                .jsonSchema(JsonSchema.builder().name("Employee").rootElement(employeeSchema).build()).build();

        // Create a new user message with a different username and message
        UserMessage userMessage = UserMessage.from("""
                Alice is 30 years old and leads a vibrant lifestyle.
                She is 1.65 meters tall and always brings energy into any room.
                Though married, she balances her family life with an active professional career.
                """);

        ChatRequest chatRequest = ChatRequest.builder().responseFormat(responseFormat).messages(userMessage).build();

        ChatModel chatModel = OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2").build();

        ChatResponse chatResponse = chatModel.chat(chatRequest);

        String output = chatResponse.aiMessage().text();
        System.out.println(output);

        Employee person = new ObjectMapper().readValue(output, Employee.class);
        System.out.println(person);
    }

}

Output

{

  "name": "Alice",
  "age": 30,
  "height": 1.65,
  "married": true
}
 
Employee[name=Alice, age=30, height=1.65, married=true]

If a large language model (LLM) does not produce the expected output, you can provide additional guidance by annotating classes and fields with @Description. These annotations help clarify the intent and format of the data for the LLM. For example.

@Description("An Employee")
public record Employee(
    @Description("Employee's full name, e.g., 'Krishna'") String name,
    @Description("Employee's age, e.g., 42") int age,
    @Description("Employee's height in meters, e.g., 1.78") double height,
    @Description("Whether the employee is married, e.g., false") boolean married
) {}

 

This approach allows you to supply meaningful context and examples, improving the LLM's understanding and output accuracy.

 


Previous                                                    Next                                                    Home

No comments:

Post a Comment