Programming for beginners: Lucene: Add Document to the index

IndexWriter class provides ‘addDocument’ method, it is used to add document to an index.

Signature

public long addDocument(Iterable<? extends IndexableField> doc) throws IOException

Example

indexWriter.addDocument(document);

'addDocument' method periodically flushes pending documentscto the Directory, andcalso periodically triggers segment merges in the indexcaccording to the MergePolicy in use.

Find the below working application.

DocumentUtil.java

package com.sample.app.util;

import java.util.Arrays;
import java.util.List;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;

public class DocumentUtil {

	private static Document getDocument(String id, String title, String description, String blog) {
		Document doc = new Document();
		doc.add(new TextField("id", id, Field.Store.YES));
		doc.add(new TextField("title", title, Field.Store.YES));
		doc.add(new TextField("description", description, Field.Store.NO));
		doc.add(new TextField("blog", blog, Field.Store.YES));
		
		return doc;

	}

	public static List<Document> getDocuments() {
		Document doc1 = getDocument("1", "JavaWorld",
				"The original independent resource for Java developers, architects, and managers.", " javaworld.com");
		Document doc2 = getDocument("2", "Oracle Blogs | The Java Source",
				" Java powers more than 4.5 billion devices including 800 million computers and 1.5 billion cell phones. If you love Java, this is the blog you must follow.",
				"blogs.oracle.com/java");
		Document doc3 = getDocument("3", "A Java geek",
				"Nicolas Fränkel's blog. IT architect focusing on Java, Java EE, and their surrounding ecosystems. He is a trainer, book writer, speaker & blogger.",
				"blog.frankel.ch");
		Document doc4 = getDocument("4", "Self Learning Java", "Learn Java fundamentals and other java libraries",
				"self-learning-java-tutorial.blogspot.com");

		return Arrays.asList(doc1, doc2, doc3, doc4);

	}
}

App.java

package com.sample.app;

import java.io.File;
import java.io.IOException;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;
import org.apache.lucene.store.NoLockFactory;
import org.apache.lucene.util.QueryBuilder;

import com.sample.app.util.DocumentUtil;

public class App {

	public static void main(String args[]) throws IOException {

		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		Directory directory = new MMapDirectory(new File("/Users/Shared/lucene").toPath(), NoLockFactory.INSTANCE);

		try (IndexWriter indexWriter = new IndexWriter(directory, config)) {

			List<Document> documents = DocumentUtil.getDocuments();

			for (Document document : documents) {
				indexWriter.addDocument(document);
			}

		}

		QueryBuilder queryBuilder = new QueryBuilder(analyzer);
		Query query = queryBuilder.createMinShouldMatchQuery("description", "Professional java developers", 0.7f);
		int maxHitsPerPage = 10;

		try (IndexReader indexReader = DirectoryReader.open(directory)) {
			IndexSearcher indexSearcher = new IndexSearcher(indexReader);

			TopDocs docs = indexSearcher.search(query, maxHitsPerPage);
			ScoreDoc[] hits = docs.scoreDocs;
			System.out.println("Total Hits: " + docs.totalHits);
			System.out.println("Results: ");
			for (int i = 0; i < hits.length; i++) {
				Document document = indexSearcher.doc(hits[i].doc);
				System.out.println("Title: " + document.get("title"));
			}
		}

	}

}

Output

Total Hits: 1 hits
Results: 
Title: JavaWorld

Previous Next Home

Programming for beginners

Monday, 5 July 2021

Lucene: Add Document to the index

No comments:

Post a Comment