Saturday 26 June 2021

Lucene: Language Specific Analyzers

Lucene provides several built-in language specific analyzers.

 

At the time of writing this article, following are the language specific analyzers by Lucene.

1. ArabicAnalyzer

2. ArmenianAnalyzer

3. BasqueAnalyzer

4. BengaliAnalyzer

5. BrazilianAnalyzer

6. BulgarianAnalyzer

7. CatalanAnalyzer

8. CJKAnalyzer

9. CzechAnalyzer

10. DanishAnalyzer

11. EnglishAnalyzer

12. EstonianAnalyzer

13. FinnishAnalyzer

14. FrenchAnalyzer

15. GalicianAnalyzer

16. GermanAnalyzer

17. GreekAnalyzer

18. HindiAnalyzer

19. HungarianAnalyzer

20. IndonesianAnalyzer

21. IrishAnalyzer

22. ItalianAnalyzer

23. LatvianAnalyzer

24. LithuanianAnalyzer

25. NorwegianAnalyzer

26. PersianAnalyzer

27. PortugueseAnalyzer

28. RomanianAnalyzer

29. RussianAnalyzer

30. SoraniAnalyzer

31. SpanishAnalyzer

32. SwedishAnalyzer

33. ThaiAnalyzer

34. TurkishAnalyzer

 

How to get EnglishAnalyzer?

Analyzer whitespaceAnalyzer = new EnglishAnalyzer();

 

Find the below working application.

 

App.java

package com.sample.app;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.en.EnglishAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class App {

	public static List<String> getTokens(String text, String fieldName, Analyzer analyzer) throws IOException {

		TokenStream tokenStream = analyzer.tokenStream(fieldName, text);
		CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);

		tokenStream.reset();

		List<String> result = new ArrayList<String>();
		while (tokenStream.incrementToken()) {
			result.add(charTermAttribute.toString());
		}
		return result;
	}

	public static void main(String args[]) throws IOException {

		Analyzer whitespaceAnalyzer = new EnglishAnalyzer();

		List<String> tokens = getTokens("Java is a platform and programming language to build Enterprise Applications",
				null, whitespaceAnalyzer);

		for (String token : tokens) {
			System.out.println(token);
		}

	}

}

 

Output

java
platform
program
languag
build
enterpris
applic

 

 

  

Previous                                                    Next                                                    Home

No comments:

Post a Comment