This post is intended for Java developers using LangChain4j who want to understand how to configure language models like Ollama to control generation behavior (e.g., creativity, response length) and connection settings (e.g., timeouts, retries).
It showcases how to use model parameters effectively to fine-tune performance and reliability of AI-powered applications.
LangChain4j offers a clean and flexible API to integrate various language models into your Java applications. When configuring a model like Ollama, it’s important to understand the available parameters that can help you:
· Control the creativity and determinism of the output
· Specify timeouts, retries, and logging behavior
· Point to the correct model and API endpoint
Below is an example of how to set up the OllamaLanguageModel with custom parameters:
OllamaLanguageModel model = OllamaLanguageModel.builder() .baseUrl("http://localhost:11434") // Connect to your local Ollama instance .modelName("llama3.2") // Choose the model to use (e.g., llama3.2) .temperature(0.3) // Balance between creativity and coherence .maxRetries(2) // Retry up to 2 times on failure .timeout(Duration.ofMinutes(1)) // Set a timeout to avoid hanging requests .build();
What These Parameters Do?
· baseUrl: The URL where your model API is hosted (e.g., a local Ollama instance).
· modelName: The specific model to use, as defined in your model provider's documentation.
· temperature: Controls randomness in output. Lower values make responses more deterministic, higher values make them more creative.
· maxRetries: Number of retry attempts in case of failures like network timeouts.
· timeout: Maximum duration to wait for a model response before timing out.
For detailed parameter options, consult the official API documentation of the model provider. For Ollama, visit Ollama API Docs (https://github.com/ollama/ollama/blob/main/docs/api.md).
Find the below working application.
ModelParamsDemo.java
package com.sample.app.chatmodels; import java.time.Duration; import dev.langchain4j.model.ollama.OllamaLanguageModel; import dev.langchain4j.model.output.Response; public class ModelParamsDemo { public static void main(String[] args) { OllamaLanguageModel model = OllamaLanguageModel.builder().baseUrl("http://localhost:11434") .modelName("llama3.2") .temperature(0.3) .maxRetries(2) .timeout(Duration.ofMinutes(1)) .build(); String prompt = "Tell me some Interesting Fact About LLMs in maximum 30 words"; Response<String> response = model.generate(prompt); System.out.println("Response: " + response.content()); } }
Output
Response: LLMs (Large Language Models) can generate human-like text, answer questions, and even create original content, leveraging vast amounts of training data to learn patterns and relationships in language.
Following tables summarize various parameters of OllamaLanguageModel.
Connection & Configuration Parameters
Parameter |
Type |
Description |
baseUrl |
String |
The base URL of the Ollama API endpoint (e.g., http://localhost:11434). Required to send requests to the model. |
modelName |
String |
The name of the model to use, such as "llama3.2" or "llama4". This must match a model installed in your Ollama instance. |
timeout |
Duration |
The maximum time to wait for a response from the Ollama server. Helps to prevent hanging requests. |
maxRetries |
Integer |
Number of retry attempts in case of failures like timeouts or transient errors. |
logRequests |
Boolean |
If true, logs the request sent to the model. Useful for debugging or auditing. |
logResponses |
Boolean |
If true, logs the full response received from the model. Helpful for analysis and debugging. |
customHeaders |
Map<String, String> |
Additional HTTP headers to send with each request. Often used for custom auth tokens or tracing headers. |
Model Behavior Parameters
Parameter |
Type |
Description |
temperature |
Double |
Controls randomness in output. Range is [0, 1+]. Lower = more deterministic. Higher = more creative or diverse outputs. |
topK |
Integer |
Limits the next token selection to the top `K` tokens with the highest probability. Encourages focused output. |
topP |
Double |
Enables nucleus sampling: chooses from the smallest possible set of tokens whose cumulative probability exceeds topP. Values near 1.0 retain more randomness. |
repeatPenalty |
Double |
Penalizes the likelihood of tokens that have already appeared. Helps to reduce repetition. Typical values: 1.0 (no penalty), >1.0 to discourage repetition. |
seed |
Integer |
Sets the seed for deterministic outputs. If set, same inputs will always generate the same outputs. Useful for debugging or consistent testing. |
numPredict |
Integer |
Maximum number of tokens to generate in the response. Acts as a limit on the length of the output. |
numCtx |
Integer |
Context window size (number of tokens considered). Typically model-dependent. Affects how much previous conversation or text is remembered. |
stop |
List<String> |
A list of stop sequences. When any of these strings are encountered, the generation will stop. Useful for ending conversations or truncating replies at logical boundaries. |
Formatting Parameters
Parameter |
Type |
Description |
format |
String |
Output format, typically null or "json" depending on the model's capability. Not always used. |
responseFormat |
ResponseFormat (enum) |
Enum to specify expected response format (e.g., raw text vs structured JSON). Internally used by LangChain4j to parse model output appropriately. Ex: TEXT, JSON |
Why It Matters to finetune the Model parameters?
Setting these parameters appropriately can significantly affect:
· The quality of generated content
· The responsiveness and resilience of your app
· The cost (if using a hosted model with rate limits or billing)
In summary, LangChain4j makes it easy to plug in various models and customize them per your use case. Whether you're building chatbots, summarization tools, or code assistants, configuring the model correctly is key to delivering a smooth developer and user experience.
Previous Next Home
No comments:
Post a Comment