TokenStream represents an intermediate data format between different components of analysis process and it is an enumeration of tokens. Analyzer can take a Reader as input and output TokenStream.
Example
Analyzer analyzer = new EnglishAnalyzer();
Reader reader = new StringReader("Text to be passed");
TokenStream tokenStream = analyzer.tokenStream("myField", reader);
App.java
package com.sample.app;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class App {
public static void main(String args[]) throws IOException {
try (Analyzer analyzer = new EnglishAnalyzer()) {
Reader reader = new StringReader("Java is a Programming Language");
TokenStream tokenStream = analyzer.tokenStream("myField", reader);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
System.out.println(charTermAttribute.toString());
}
}
}
}
Output
java program languag
Previous Next Home
No comments:
Post a Comment