Lucene provides several built-in language specific analyzers.
At the time of writing this article, following are the language specific analyzers by Lucene.
1. ArabicAnalyzer
2. ArmenianAnalyzer
3. BasqueAnalyzer
4. BengaliAnalyzer
5. BrazilianAnalyzer
6. BulgarianAnalyzer
7. CatalanAnalyzer
8. CJKAnalyzer
9. CzechAnalyzer
10. DanishAnalyzer
11. EnglishAnalyzer
12. EstonianAnalyzer
13. FinnishAnalyzer
14. FrenchAnalyzer
15. GalicianAnalyzer
16. GermanAnalyzer
17. GreekAnalyzer
18. HindiAnalyzer
19. HungarianAnalyzer
20. IndonesianAnalyzer
21. IrishAnalyzer
22. ItalianAnalyzer
23. LatvianAnalyzer
24. LithuanianAnalyzer
25. NorwegianAnalyzer
26. PersianAnalyzer
27. PortugueseAnalyzer
28. RomanianAnalyzer
29. RussianAnalyzer
30. SoraniAnalyzer
31. SpanishAnalyzer
32. SwedishAnalyzer
33. ThaiAnalyzer
34. TurkishAnalyzer
How to get EnglishAnalyzer?
Analyzer whitespaceAnalyzer = new EnglishAnalyzer();
Find the below working application.
App.java
package com.sample.app;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class App {
public static List<String> getTokens(String text, String fieldName, Analyzer analyzer) throws IOException {
TokenStream tokenStream = analyzer.tokenStream(fieldName, text);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
List<String> result = new ArrayList<String>();
while (tokenStream.incrementToken()) {
result.add(charTermAttribute.toString());
}
return result;
}
public static void main(String args[]) throws IOException {
Analyzer whitespaceAnalyzer = new EnglishAnalyzer();
List<String> tokens = getTokens("Java is a platform and programming language to build Enterprise Applications",
null, whitespaceAnalyzer);
for (String token : tokens) {
System.out.println(token);
}
}
}
Output
java platform program languag build enterpris applic
Previous Next Home
No comments:
Post a Comment