Saturday, 19 June 2021

Lucene: Core Classes

There are 6 Core classes in Lucene Library.

a.   Document

b.   Analyzer

c.    IndexWriter

d.   Directory

e.   IndexReader

f.     IndexSearcher

g.   Query

h.   TopDocs

 


 

Field

Field is a section of document. Each field has three parts: name, type and value. Values may be text (String, Reader or pre-analyzed TokenStream), binary (byte[]), or numeric (a Number).  Fields are optionally stored in the index, so that they may be returned with hits on the document.

 

Document

Document represent actual content that is going to be indexed, it is a collection of fields. An example of a document looks like below.

 

Example

"id": "1"
"title" : "Lucene in Action"
"description" : "Lucene is a platform where we can index our data to make it searchable."

 

In the above example, id, title and description are fields.

 

Analyzer

Analyzer filter the data and extract tokens from data. Typically, Analyzers remove stop words, stem the data and make the token lowercase to support case-insensitive searches.

 

IndexWrtier

This class is used to create new index, open existing index and add, update and delete documents from index.

 

Directory

This class represents location of the index. Directory is an abstract class and many concrete implementations like MMapDirectory, FileSwitchDirectory, Lucene50CompoundReader etc., are available.

 

IndexReader

IndexReader is an abstract class, it provides an interface for accessing a point-in-time view of an index.  Any changes made to the index via IndexWriter will not be visible until a new IndexReader is opened.

 

IndexSearcher

Implements search over a single IndexReader. For performance reasons, if your index is unchanging, you should share a single IndexSearcher instance across multiple searches instead of creating a new one

per-search.  If your index has changed and you wish to see the changes reflected in searching, you should use DirectoryReader.openIfChanged(DirectoryReader) to obtain a new reader and then create a new IndexSearcher from that.

 

Query

It is the abstract class that represents a query. Following concrete classes extend Query class.

a.   TermQuery

b.   BooleanQuery

c.    WildcardQuery

d.   PhraseQuery

e.   PrefixQuery

f.     MultiPhraseQuery

g.   FuzzyQuery

h.   RegexpQuery

i.     TermRangeQuery

j.     PointRangeQuery

k.    ConstantScoreQuery

l.     DisjunctionMaxQuery

m. MatchAllDocsQuery

 

Example

QueryBuilder queryBuilder = new QueryBuilder(analyzer);
Query query = queryBuilder.createPhraseQuery("title", "Lucene", 0);

TopDocs

It represents top N ranked search results.

 

Example

TopDocs docs = indexSearcher.search(query, maxHitsPerPage);


Find the below working application.

 

App.java

package com.sample.app;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;
import org.apache.lucene.store.NoLockFactory;
import org.apache.lucene.util.QueryBuilder;

public class App {

    public static void main(String args[]) throws IOException {

        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        Directory directory = new MMapDirectory(new File("/Users/Shared/lucene").toPath(), NoLockFactory.INSTANCE);

        Document doc1 = new Document();
        doc1.add(new TextField("id", "1", Field.Store.YES));
        doc1.add(new TextField("title", "Lucene in Action", Field.Store.YES));
        doc1.add(new TextField("description", "Lucene is a platform where we can index our data to make it searchable.",
                Field.Store.YES));

        Document doc2 = new Document();
        doc2.add(new TextField("id", "2", Field.Store.YES));
        doc2.add(new TextField("title", "Java in Action", Field.Store.YES));
        doc2.add(new TextField("description",
                "Java is a platform and programming language to build Enterprise Applications", Field.Store.YES));

        try (IndexWriter indexWriter = new IndexWriter(directory, config)) {
            indexWriter.addDocument(doc1);
            indexWriter.addDocument(doc2);
        }

        QueryBuilder queryBuilder = new QueryBuilder(analyzer);
        Query query = queryBuilder.createPhraseQuery("title", "Lucene", 0);
        int maxHitsPerPage = 10;

        try (IndexReader indexReader = DirectoryReader.open(directory)) {
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);

            TopDocs docs = indexSearcher.search(query, maxHitsPerPage);
            ScoreDoc[] hits = docs.scoreDocs;
            System.out.println("Total Hits: " + docs.totalHits);
            System.out.println("Results: ");
            for (int i = 0; i < hits.length; i++) {
                Document d = indexSearcher.doc(hits[i].doc);
                System.out.println("Content: " + d.get("title"));
            }
        }

    }

}


Output

Total Hits: 1 hits
Results: 
Content: Lucene in Action



Previous                                                    Next                                                    Home

No comments:

Post a Comment