SimpleAnalyzer tokenize the text at non-letters (It uses 'Character.isLetter()' method to identify whether given character is a letter or not.) and normalize the text to lowercase.
How to get SimpleAnalyzer?
Analyzer whitespaceAnalyzer = new SimpleAnalyzer();
Find the below working application.
App.java
package com.sample.app;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class App {
public static List<String> getTokens(String text, String fieldName, Analyzer analyzer) throws IOException {
TokenStream tokenStream = analyzer.tokenStream(fieldName, text);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
List<String> result = new ArrayList<String>();
while (tokenStream.incrementToken()) {
result.add(charTermAttribute.toString());
}
return result;
}
public static void main(String args[]) throws IOException {
Analyzer whitespaceAnalyzer = new SimpleAnalyzer();
List<String> tokens = getTokens("Hello, How Are you\t\tI am Fine \n Thank you", null, whitespaceAnalyzer);
for (String token : tokens) {
System.out.println(token);
}
}
}
Output
hello how are you i am fine thank you
This analyzer helps us in performing case-insensitive searches.
No comments:
Post a Comment