Tuesday, 8 July 2025

Extract Structured Data into POJOs Using LangChain4j with JSON Mode

Do you want to learn how to leverage LangChain4j's @AiService with JSON mode to extract structured data from unstructured natural language text directly into custom Java POJOs. This post demonstrates how to annotate your data model and interface to guide the LLM in mapping fields correctly using @Description, along with the importance of JSON mode for reliable parsing.

 

When working with unstructured text data, it's common to want structured outputs like Java objects. With LangChain4j, you can easily define POJOs (Plain Old Java Objects) and let the LLM extract information into these objects directly, especially powerful when using JSON mode to ensure predictable, valid JSON responses.

 

Define Your POJOs with Helpful Descriptions.

public class Employee {
  @Description("First name of an Employee")
  private String firstName;

  @Description("Last name of an Employee")
  private String lastName;

  @Description("Employee Birth date, if unable to retrieve the valid birth date keep it as null")
  private LocalDate birthDate;

  @Description("Employee Address")
  private Address address;
}

@Description("Specify an Employee Address")
class Address {
  private String street;
  private Integer streetNumber;
  private String city;
}

These @Description annotations help the LLM understand how to populate each field.

 

Create the AI Service Interface:

 

public interface EmployeeExtractor {
  @UserMessage("Extract information about an employee from the following information:\n{{description}}")
  Employee extractPersonFrom(@V("description") String description);
}

Make sure JSON mode is enabled in your chat model config (depending on the LangChain4j version and provider, e.g., OpenAI):

EmployeeExtractor empExtractor = AiServices.create(EmployeeExtractor.class, chatModel);

 

Provide a Natural Language Description.

String description = "In the crisp winter morning of January 2010, just as the city stirred to life after New Year celebrations, Rajesh Nair stepped into the next chapter of his professional journey. Carrying a humble backpack and a quiet determination, Rajesh joined as a software engineer at Sundaram Tech Solutions. His joining formalities were completed at Block B, 6th Floor, Sriram IT Park, Hinjewadi Phase 2, Pune — an address that would soon become the backdrop of countless hours of coding, coffee breaks, and career-defining milestones.";

Employee emp = empExtractor.extractPersonFrom(description);

 

Why to use JSON Mode?

When you enable JSON mode, LangChain4j instructs the LLM to strictly return a valid JSON that can be directly parsed into your POJO. This ensures better reliability and consistency in extraction use cases like:

 

·      Resume parsing

·      News summarization

·      Product review analysis

·      Entity extraction

 

How to Enable JSON Mode for Different Providers?

Below are the code snippets and explanations for enabling JSON mode with various popular model providers.

 

For newer OpenAI models (e.g., gpt-4o-mini, gpt-4o-2024-08-06) that support structured output:

OpenAiChatModel.builder()
    .apiKey("YOUR_API_KEY")
    .modelName("gpt-4o-2024-08-06")
    .supportedCapabilities(SupportedCapabilities.RESPONSE_FORMAT_JSON_SCHEMA)
    .strictJsonSchema(true) // Enforces strict validation
    .build();

These models natively support schema-guided generation, improving JSON consistency.

 

For older OpenAI models (e.g., gpt-3.5-turbo, gpt-4):

OpenAiChatModel.builder()
    .apiKey("YOUR_API_KEY")
    .modelName("gpt-3.5-turbo")
    .responseFormat("json_object") // Prompts the model to respond in JSON
    .build();

 

While not as strict as schema-guided outputs, this still helps steer the model toward structured results.

 

Azure OpenAI

AzureOpenAiChatModel.builder()
    .apiKey("YOUR_AZURE_KEY")
    .endpoint("YOUR_ENDPOINT")
    .deploymentName("gpt-deployment")
    .responseFormat(new ChatCompletionsJsonResponseFormat())
    .build();

Ensures structured responses when using Azure-hosted OpenAI models.

 

Google Vertex AI Gemini

Basic JSON response configuration:

VertexAiGeminiChatModel.builder()
    .projectId("your-project")
    .location("us-central1")
    .responseMimeType("application/json")
    .build();

 

With schema generated from a Java class:

VertexAiGeminiChatModel.builder()
    .responseSchema(SchemaHelper.fromClass(Employee.class))
    .build();

 

Using a custom JSON schema:

VertexAiGeminiChatModel.builder()
    .responseSchema(Schema.builder()
        .title("Employee")
        .type(SchemaType.OBJECT)
        // add your properties here
        .build())
    .build();

 

Google AI Gemini

Basic JSON format setup:

 

GoogleAiGeminiChatModel.builder()
    .responseFormat(ResponseFormat.JSON)
    .build();

 

With schema from a Java class:

GoogleAiGeminiChatModel.builder()
    .responseFormat(ResponseFormat.builder()
        .type(ResponseFormat.Type.JSON)
        .jsonSchema(JsonSchemas.jsonSchemaFrom(Employee.class).get())
        .build())
    .build();

 

Or define your own JSON schema manually:

GoogleAiGeminiChatModel.builder()
    .responseFormat(ResponseFormat.builder()
        .type(ResponseFormat.Type.JSON)
        .jsonSchema(JsonSchema.builder()
            // define schema fields here
            .build())
        .build())
    .build();

 

Mistral AI

MistralAiChatModel.builder()
    .apiKey("YOUR_API_KEY")
    .responseFormat(MistralAiResponseFormatType.JSON_OBJECT)
    .build();

 

Enables deterministic and parseable JSON responses from Mistral models.

 

Ollama

 

OllamaChatModel.builder()
    .responseFormat(ResponseFormat.JSON)
    .build();

Ensure your Ollama-hosted models return valid JSON responses for structured parsing.

 

If the model you’re using does not natively support JSON mode, consider the following:

 

·      Prompt Engineering: Manually instruct the model in your prompt to return only valid JSON.

·      Lower the Temperature: Reduces randomness, making responses more predictable and structured.

·      Post-Processing: Validate and clean the output using JSON parsers or fallback handling logic.

 

 

Follow below step-by-step procedure to build a working application.

 

Step 1: Create new maven project custom-pojo-as-return-type.

 

Step 2: Update pom.xml with maven dependencies.

 

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.sample.app</groupId>
  <artifactId>custom-pojo-as-return-type</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <properties>
    <maven.compiler.source>21</maven.compiler.source>
    <maven.compiler.target>21</maven.compiler.target>
    <java.version>21</java.version>
  </properties>

  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>3.3.10</version>
  </parent>

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-bom</artifactId>
        <version>1.0.1</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>

  <dependencies>
    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-spring-boot-starter</artifactId>
    </dependency>


    <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-ollama-spring-boot-starter</artifactId>
    </dependency>

    <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
      <groupId>org.springdoc</groupId>
      <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
      <version>2.6.0</version>
    </dependency>

    <!--
    https://mvnrepository.com/artifact/jakarta.validation/jakarta.validation-api -->
    <dependency>
      <groupId>jakarta.validation</groupId>
      <artifactId>jakarta.validation-api</artifactId>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>repackage</goal> <!-- Important -->
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

 

Step 3: Define Employee and Address POJOs.

 

Address.java

 

package com.sample.app.dto;

import dev.langchain4j.model.output.structured.Description;

//you can add an optional description to help an LLM have a better understanding
@Description("Specify an Employee Address")
class Address {
  private String street;
  private Integer streetNumber;
  private String city;

  public String getStreet() {
    return street;
  }

  public void setStreet(String street) {
    this.street = street;
  }

  public Integer getStreetNumber() {
    return streetNumber;
  }

  public void setStreetNumber(Integer streetNumber) {
    this.streetNumber = streetNumber;
  }

  public String getCity() {
    return city;
  }

  public void setCity(String city) {
    this.city = city;
  }

}

 

Employee.java

package com.sample.app.dto;

import java.time.LocalDate;

import dev.langchain4j.model.output.structured.Description;

public class Employee {
  // We can add additional details to help LLM have a better understanding.
  @Description("First name of an Employee")
  private String firstName;
  
  @Description("Last name of an Employee")
  private String lastName;
  
  @Description("Employee Birth date, if unable to retrive the valid brith date keep it as null")
  private LocalDate birthDate;
  
  @Description("Employee Address")
  private  Address address;

  public String getFirstName() {
    return firstName;
  }

  public void setFirstName(String firstName) {
    this.firstName = firstName;
  }

  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  public LocalDate getBirthDate() {
    return birthDate;
  }

  public void setBirthDate(LocalDate birthDate) {
    this.birthDate = birthDate;
  }

  public Address getAddress() {
    return address;
  }

  public void setAddress(Address address) {
    this.address = address;
  }

}

 

Step 4: Define EmployeeExtractor interface.

 

EmployeeExtractor.java

 

package com.sample.app.interfaces;

import com.sample.app.dto.Employee;

import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.V;

public interface EmployeeExtractor {
  @UserMessage("Extract information about an employee from following information.\n{{description}}")
  Employee extractPersonFrom(@V("description") String description);
}

Step 5: Define LangchainConfig and SwaggerConfig classes.

LangchainConfig.java

 

package com.sample.app.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import dev.langchain4j.http.client.spring.restclient.SpringRestClientBuilderFactory;
import dev.langchain4j.model.chat.request.ResponseFormat;
import dev.langchain4j.model.ollama.OllamaChatModel;

@Configuration
public class LangchainConfig {

  @Bean
  public OllamaChatModel ollamaLanguageModel() {
    return OllamaChatModel.builder().baseUrl("http://localhost:11434").modelName("llama3.2")
        .responseFormat(ResponseFormat.JSON).temperature(0.2)
        .logRequests(true)
        .httpClientBuilder(new SpringRestClientBuilderFactory().create()) // explicitly use Spring's HTTP client
        .build();
  }
}

SwaggerConfig.java

package com.sample.app.config;

import org.springframework.context.annotation.Configuration;

import io.swagger.v3.oas.annotations.OpenAPIDefinition;
import io.swagger.v3.oas.annotations.info.Info;

@Configuration
@OpenAPIDefinition(info = @Info(title = "Chat service Application", version = "v1"))
public class SwaggerConfig {

}

 

Step 6: Define EmployeeExtractionService class.

 

EmployeeExtractionService.java 

package com.sample.app.service;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import com.sample.app.dto.Employee;
import com.sample.app.interfaces.EmployeeExtractor;

import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.service.AiServices;

@Service
public class EmployeeExtractionService {

  @Autowired
  private OllamaChatModel chatModel;

  public Employee getEmployee(String description) {
    EmployeeExtractor empExtractor = AiServices.create(EmployeeExtractor.class, chatModel);
    return empExtractor.extractPersonFrom(description);
  }

}

 

Step 7: Define EmployeeController class.

 

EmployeeController.java

package com.sample.app.controller;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.CrossOrigin;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import com.sample.app.dto.Employee;
import com.sample.app.service.ExmployeeExtractionService;

import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.validation.Valid;
import jakarta.validation.constraints.NotEmpty;

@RestController
@RequestMapping("/api/employee-extractor")
@CrossOrigin("*")
@Tag(name = "Employee Controller", description = "This section contains APIs related to Employee APIs Powered by Ollama")
public class EmployeeController {

  @Autowired
  private ExmployeeExtractionService employeeExtractionService;

  @PostMapping
  public Employee chat(@RequestBody @Valid RequestPayload requestPayload) {
    return employeeExtractionService.getEmployee(requestPayload.getDescription());
  }

  private static class RequestPayload {
    @NotEmpty
    private String description;

    public String getDescription() {
      return description;
    }

    public void setDescription(String description) {
      this.description = description;
    }

  }
}

Step 8: Define main application class.

 

App.java

package com.sample.app;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class App {

  public static void main(String[] args) {
    SpringApplication.run(App.class, args);
  }

}

Build the project

Navigate to the project root directory where pom.xml is located and execute following command to generate the artifact.

mvn clean install

 

Upon successful execution of the command, you can see custom-pojo-as-return-type-0.0.1-SNAPSHOT.jar file in the target folder.

$ ls ./target/
classes             generated-test-sources
custom-pojo-as-return-type-0.0.1-SNAPSHOT.jar   maven-archiver
custom-pojo-as-return-type-0.0.1-SNAPSHOT.jar.original  maven-status
generated-sources         test-classes

 

Run the Application

Execute following command to run the application

 

java -jar ./target/custom-pojo-as-return-type-0.0.1-SNAPSHOT.jar --server.port=1235

Open following url in the browser.

http://localhost:1235/swagger-ui/index.html

 

Execute the API /api/employee-extractor with below payload.

{
  "description": "In the crisp winter morning of January 2010, just as the city stirred to life after New Year celebrations, Rajesh Nair stepped into the next chapter of his professional journey. Carrying a humble backpack and a quiet determination, Rajesh joined as a software engineer at Sundaram Tech Solutions. His joining formalities were completed at Block B, 6th Floor, Sriram IT Park, Hinjewadi Phase 2, Pune — an address that would soon become the backdrop of countless hours of coding, coffee breaks, and career-defining milestones. As he walked past buzzing workstations and glass cabins filled with ideas, the hum of keyboards and soft chimes of emails felt like the rhythm of possibility. With every step, Rajesh began not just a job, but a journey marked by passion, perseverance, and purpose."
}

You will see following details.

{
  "firstName": "Rajesh",
  "lastName": "Nair",
  "birthDate": null,
  "address": {
    "street": "Sriram IT Park",
    "streetNumber": 6,
    "city": "Hinjewadi Phase 2, Pune"
  }
}

You can download the application from this link

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment