Saturday, 28 June 2025

Customizing Language Model Behavior with Model Parameters in LangChain4j

This post is intended for Java developers using LangChain4j who want to understand how to configure language models like Ollama to control generation behavior (e.g., creativity, response length) and connection settings (e.g., timeouts, retries).

 

It showcases how to use model parameters effectively to fine-tune performance and reliability of AI-powered applications.

 

LangChain4j offers a clean and flexible API to integrate various language models into your Java applications. When configuring a model like Ollama, it’s important to understand the available parameters that can help you:

 

·      Control the creativity and determinism of the output

·      Specify timeouts, retries, and logging behavior

·      Point to the correct model and API endpoint

 

Below is an example of how to set up the OllamaLanguageModel with custom parameters: 

OllamaLanguageModel model = OllamaLanguageModel.builder()
    .baseUrl("http://localhost:11434")     // Connect to your local Ollama instance
    .modelName("llama3.2")                 // Choose the model to use (e.g., llama3.2)
    .temperature(0.3)                      // Balance between creativity and coherence
    .maxRetries(2)                         // Retry up to 2 times on failure
    .timeout(Duration.ofMinutes(1))       // Set a timeout to avoid hanging requests
    .build();

What These Parameters Do?

·      baseUrl: The URL where your model API is hosted (e.g., a local Ollama instance).

·      modelName: The specific model to use, as defined in your model provider's documentation.

·      temperature: Controls randomness in output. Lower values make responses more deterministic, higher values make them more creative.

·      maxRetries: Number of retry attempts in case of failures like network timeouts.

·      timeout: Maximum duration to wait for a model response before timing out.

 

For detailed parameter options, consult the official API documentation of the model provider. For Ollama, visit Ollama API Docs (https://github.com/ollama/ollama/blob/main/docs/api.md).

 

Find the below working application.

 

ModelParamsDemo.java

 

package com.sample.app.chatmodels;

import java.time.Duration;

import dev.langchain4j.model.ollama.OllamaLanguageModel;
import dev.langchain4j.model.output.Response;

public class ModelParamsDemo {
    public static void main(String[] args) {
        OllamaLanguageModel model = OllamaLanguageModel.builder().baseUrl("http://localhost:11434") 
                .modelName("llama3.2")
                .temperature(0.3)
                .maxRetries(2)
                .timeout(Duration.ofMinutes(1))
                .build();

        String prompt = "Tell me some Interesting Fact About LLMs in maximum 30 words";
        Response<String> response = model.generate(prompt);

        System.out.println("Response: " + response.content());
    }
}

Output

Response: LLMs (Large Language Models) can generate human-like text, answer questions, and even create original content, leveraging vast amounts of training data to learn patterns and relationships in language.

Following tables summarize various parameters of OllamaLanguageModel.

 

Connection & Configuration Parameters

 

Parameter  

Type

Description

baseUrl

String 

The base URL of the Ollama API endpoint (e.g., http://localhost:11434). Required to send requests to the model.

modelName  

String 

The name of the model to use, such as "llama3.2" or "llama4". This must match a model installed in your Ollama instance.

timeout

Duration   

The maximum time to wait for a response from the Ollama server. Helps to prevent hanging requests.

maxRetries 

Integer

Number of retry attempts in case of failures like timeouts or transient errors.

logRequests

Boolean

If true, logs the request sent to the model. Useful for debugging or auditing.

logResponses   

Boolean

If true, logs the full response received from the model. Helpful for analysis and debugging.

customHeaders  

Map<String, String>

Additional HTTP headers to send with each request. Often used for custom auth tokens or tracing headers.

 

Model Behavior Parameters

Parameter

Type

Description

temperature

Double

Controls randomness in output. Range is [0, 1+]. Lower = more deterministic. Higher = more creative or diverse outputs.

topK

Integer

Limits the next token selection to the top `K` tokens with the highest probability. Encourages focused output.                                                             

topP

Double

Enables nucleus sampling: chooses from the smallest possible set of tokens whose cumulative probability exceeds topP. Values near 1.0 retain more randomness.

repeatPenalty  

Double 

Penalizes the likelihood of tokens that have already appeared. Helps to reduce repetition. Typical values: 1.0 (no penalty), >1.0 to discourage repetition.

seed   

Integer

Sets the seed for deterministic outputs. If set, same inputs will always generate the same outputs. Useful for debugging or consistent testing.

numPredict 

Integer

Maximum number of tokens to generate in the response. Acts as a limit on the length of the output.

numCtx 

Integer

Context window size (number of tokens considered). Typically model-dependent. Affects how much previous conversation or text is remembered.

stop

List<String>

A list of stop sequences. When any of these strings are encountered, the generation will stop. Useful for ending conversations or truncating replies at logical boundaries.

 

Formatting Parameters

Parameter

Type

Description

format 

String

Output format, typically null or "json" depending on the model's capability. Not always used.

responseFormat 

ResponseFormat (enum)  

Enum to specify expected response format (e.g., raw text vs structured JSON). Internally used by LangChain4j to parse model output appropriately. Ex: TEXT, JSON

 

Why It Matters to finetune the Model parameters?

Setting these parameters appropriately can significantly affect:

 

·      The quality of generated content

·      The responsiveness and resilience of your app

·      The cost (if using a hosted model with rate limits or billing)

 

In summary, LangChain4j makes it easy to plug in various models and customize them per your use case. Whether you're building chatbots, summarization tools, or code assistants, configuring the model correctly is key to delivering a smooth developer and user experience.

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment