Monday 28 June 2021

Lucene: How to get TokenStream?

TokenStream represents an intermediate data format between different components of analysis process and it is an enumeration of tokens. Analyzer can take a Reader as input and output TokenStream.

 

Example

Analyzer analyzer = new EnglishAnalyzer();
Reader reader = new StringReader("Text to be passed");
TokenStream tokenStream = analyzer.tokenStream("myField", reader);

 

App.java

package com.sample.app;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class App {

	public static void main(String args[]) throws IOException {

		try (Analyzer analyzer = new EnglishAnalyzer()) {
			Reader reader = new StringReader("Java is a Programming Language");
			TokenStream tokenStream = analyzer.tokenStream("myField", reader);

			CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);

			tokenStream.reset();

			while (tokenStream.incrementToken()) {
				System.out.println(charTermAttribute.toString());
			}
		}

	}

}

 

Output

java
program
languag

 


  

Previous                                                    Next                                                    Home

No comments:

Post a Comment