StopAnalyzer perform below operations.
a. Tokenize the text at non-letters (A letter is identified using 'Character.isLetter()' method).
b. Normalizes token text to lower case.
c. Removes stop words from a token stream.
How to get StopAnalyzer?
StopAnalyzer class provides following constructors to get an instance of StopAnalyzer.
public StopAnalyzer(CharArraySet stopWords)
public StopAnalyzer(Path stopwordsFile) throws IOException
public StopAnalyzer(Reader stopwords) throws IOException
Example
CharArraySet charArraySet = new CharArraySet(10, true);
charArraySet.add("a");
charArraySet.add("an");
charArraySet.add("are");
charArraySet.add("is");
charArraySet.add("the");
charArraySet.add("to");
charArraySet.add("you");
Analyzer whitespaceAnalyzer = new StopAnalyzer(charArraySet);
Find the below working application.
App.java
package com.sample.app;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.CharArraySet;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.StopAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class App {
public static List<String> getTokens(String text, String fieldName, Analyzer analyzer) throws IOException {
TokenStream tokenStream = analyzer.tokenStream(fieldName, text);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
List<String> result = new ArrayList<String>();
while (tokenStream.incrementToken()) {
result.add(charTermAttribute.toString());
}
return result;
}
public static void main(String args[]) throws IOException {
CharArraySet charArraySet = new CharArraySet(10, true);
charArraySet.add("a");
charArraySet.add("an");
charArraySet.add("are");
charArraySet.add("is");
charArraySet.add("the");
charArraySet.add("to");
charArraySet.add("you");
Analyzer whitespaceAnalyzer = new StopAnalyzer(charArraySet);
List<String> tokens = getTokens("Java is a programming Language to build Enterprise Applications", null,
whitespaceAnalyzer);
for (String token : tokens) {
System.out.println(token);
}
}
}
Output
java programming language build enterprise applications
Previous Next Home
No comments:
Post a Comment