There are 6 Core classes in Lucene Library.
a. Document
b. Analyzer
c. IndexWriter
d. Directory
e. IndexReader
f. IndexSearcher
g. Query
h. TopDocs
Field
Field is a section of document. Each field has three parts: name, type and value. Values may be text (String, Reader or pre-analyzed TokenStream), binary (byte[]), or numeric (a Number). Fields are optionally stored in the index, so that they may be returned with hits on the document.
Document
Document represent actual content that is going to be indexed, it is a collection of fields. An example of a document looks like below.
Example
"id": "1" "title" : "Lucene in Action" "description" : "Lucene is a platform where we can index our data to make it searchable."
In the above example, id, title and description are fields.
Analyzer
Analyzer filter the data and extract tokens from data. Typically, Analyzers remove stop words, stem the data and make the token lowercase to support case-insensitive searches.
IndexWrtier
This class is used to create new index, open existing index and add, update and delete documents from index.
Directory
This class represents location of the index. Directory is an abstract class and many concrete implementations like MMapDirectory, FileSwitchDirectory, Lucene50CompoundReader etc., are available.
IndexReader
IndexReader is an abstract class, it provides an interface for accessing a point-in-time view of an index. Any changes made to the index via IndexWriter will not be visible until a new IndexReader is opened.
IndexSearcher
Implements search over a single IndexReader. For performance reasons, if your index is unchanging, you should share a single IndexSearcher instance across multiple searches instead of creating a new one
per-search. If your index has changed and you wish to see the changes reflected in searching, you should use DirectoryReader.openIfChanged(DirectoryReader) to obtain a new reader and then create a new IndexSearcher from that.
Query
It is the abstract class that represents a query. Following concrete classes extend Query class.
a. TermQuery
b. BooleanQuery
c. WildcardQuery
d. PhraseQuery
e. PrefixQuery
f. MultiPhraseQuery
g. FuzzyQuery
h. RegexpQuery
i. TermRangeQuery
j. PointRangeQuery
k. ConstantScoreQuery
l. DisjunctionMaxQuery
m. MatchAllDocsQuery
Example
QueryBuilder queryBuilder = new QueryBuilder(analyzer);
Query query = queryBuilder.createPhraseQuery("title", "Lucene", 0);
TopDocs
It represents top N ranked search results.
Example
TopDocs docs = indexSearcher.search(query, maxHitsPerPage);
Find the below working application.
App.java
package com.sample.app;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.MMapDirectory;
import org.apache.lucene.store.NoLockFactory;
import org.apache.lucene.util.QueryBuilder;
public class App {
public static void main(String args[]) throws IOException {
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Directory directory = new MMapDirectory(new File("/Users/Shared/lucene").toPath(), NoLockFactory.INSTANCE);
Document doc1 = new Document();
doc1.add(new TextField("id", "1", Field.Store.YES));
doc1.add(new TextField("title", "Lucene in Action", Field.Store.YES));
doc1.add(new TextField("description", "Lucene is a platform where we can index our data to make it searchable.",
Field.Store.YES));
Document doc2 = new Document();
doc2.add(new TextField("id", "2", Field.Store.YES));
doc2.add(new TextField("title", "Java in Action", Field.Store.YES));
doc2.add(new TextField("description",
"Java is a platform and programming language to build Enterprise Applications", Field.Store.YES));
try (IndexWriter indexWriter = new IndexWriter(directory, config)) {
indexWriter.addDocument(doc1);
indexWriter.addDocument(doc2);
}
QueryBuilder queryBuilder = new QueryBuilder(analyzer);
Query query = queryBuilder.createPhraseQuery("title", "Lucene", 0);
int maxHitsPerPage = 10;
try (IndexReader indexReader = DirectoryReader.open(directory)) {
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
TopDocs docs = indexSearcher.search(query, maxHitsPerPage);
ScoreDoc[] hits = docs.scoreDocs;
System.out.println("Total Hits: " + docs.totalHits);
System.out.println("Results: ");
for (int i = 0; i < hits.length; i++) {
Document d = indexSearcher.doc(hits[i].doc);
System.out.println("Content: " + d.get("title"));
}
}
}
}
Output
Total Hits: 1 hits Results: Content: Lucene in Action
No comments:
Post a Comment