Programming for beginners: Lucene: Store the original field value in index

Lucene provides an option to specify how a field should be stored. For example, you can specify to store original field value in index.

Example 1

Field field = new TextField("title", title, Field.Store.YES);

'Field.Store.YES' notifies Lucene to store the original field value in the index. This is useful for short texts like a document's title which should be displayed with the results. The value is stored in its original form, i.e., no analyzer is applied on the content while storing.

Example 1

Field field = new TextField("description", title, Field.Store.NO);

'Field.Store.NO' notifies Lucene to not store the field value in the index.

When you get a document from Luce index, it returns the field where the value is stored. For example, the field 'title' is returned with original value and field 'description' is not returned.

Let’s confirm the same with an example.

DocumentUtil.java

package com.sample.app.util;

import java.util.Arrays;
import java.util.List;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;

public class DocumentUtil {

	private static Document getDocument(String id, String title, String description, String blog) {
		Document doc = new Document();
		doc.add(new TextField("id", id, Field.Store.YES));
		doc.add(new TextField("title", title, Field.Store.YES));
		doc.add(new TextField("description", description, Field.Store.NO));
		doc.add(new TextField("blog", blog, Field.Store.YES));
		
		return doc;

	}

	public static List<Document> getDocuments() {
		Document doc1 = getDocument("1", "JavaWorld",
				"The original independent resource for Java developers, architects, and managers.", " javaworld.com");
		Document doc2 = getDocument("2", "Oracle Blogs | The Java Source",
				" Java powers more than 4.5 billion devices including 800 million computers and 1.5 billion cell phones. If you love Java, this is the blog you must follow.",
				"blogs.oracle.com/java");
		Document doc3 = getDocument("3", "A Java geek",
				"Nicolas Fränkel's blog. IT architect focusing on Java, Java EE, and their surrounding ecosystems. He is a trainer, book writer, speaker & blogger.",
				"blog.frankel.ch");
		Document doc4 = getDocument("4", "Self Learning Java", "Learn Java fundamentals and other java libraries",
				"self-learning-java-tutorial.blogspot.com");

		return Arrays.asList(doc1, doc2, doc3, doc4);

	}
}

App.java

package com.sample.app;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;
import org.apache.lucene.store.NoLockFactory;
import org.apache.lucene.util.QueryBuilder;

import com.sample.app.util.DocumentUtil;

public class App {

	public static void main(String args[]) throws IOException {

		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		Directory directory = new MMapDirectory(new File("/Users/Shared/lucene").toPath(), NoLockFactory.INSTANCE);

		try (IndexWriter indexWriter = new IndexWriter(directory, config)) {

			for (Document doc : DocumentUtil.getDocuments()) {
				indexWriter.addDocument(doc);
			}

		}

		QueryBuilder queryBuilder = new QueryBuilder(analyzer);
		Query query = queryBuilder.createMinShouldMatchQuery("title", "Java Developers", 0.2f);
		int maxHitsPerPage = 10;

		try (IndexReader indexReader = DirectoryReader.open(directory)) {
			IndexSearcher indexSearcher = new IndexSearcher(indexReader);

			TopDocs docs = indexSearcher.search(query, maxHitsPerPage);
			ScoreDoc[] hits = docs.scoreDocs;
			System.out.println("Total Hits: " + docs.totalHits);
			System.out.println("Results: ");
			for (int i = 0; i < hits.length; i++) {
				Document d = indexSearcher.doc(hits[i].doc);
				System.out.println("Title: " + d.get("title"));
				System.out.println("Description: " + d.get("description"));
			}
		}

	}

}

Output

Total Hits: 3 hits
Results: 
Title: A Java geek
Description: null
Title: Self Learning Java
Description: null
Title: Oracle Blogs | The Java Source
Description: null

As you see the output, description field value returned as null.

Previous Next Home

Programming for beginners

Thursday, 1 July 2021

Lucene: Store the original field value in index

No comments:

Post a Comment