Monday 21 June 2021

Lucene: WhitespaceAnalyzer

WhitespaceAnalyzer is used to tokenize given string by whitespace characters. A Whitespace is identified by 'isWhitespace' method of Character class.

 

How to get an instance of WhitespaceAnalyzer?

Analyzer whitespaceAnalyzer = new WhitespaceAnalyzer();

 

Following standard method is used to get tokens for given text.

public static List<String> getTokens(String text, String fieldName, Analyzer analyzer) throws IOException {

	TokenStream tokenStream = analyzer.tokenStream(fieldName, text);
	CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
	
	tokenStream.reset();

	List<String> result = new ArrayList<String>();
	while (tokenStream.incrementToken()) {
		result.add(charTermAttribute.toString());
	}
	return result;
}

 

Find the below working application.

 

App.java

 

package com.sample.app;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

public class App {

	public static List<String> getTokens(String text, String fieldName, Analyzer analyzer) throws IOException {

		TokenStream tokenStream = analyzer.tokenStream(fieldName, text);
		CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
		
		tokenStream.reset();

		List<String> result = new ArrayList<String>();
		while (tokenStream.incrementToken()) {
			result.add(charTermAttribute.toString());
		}
		return result;
	}

	public static void main(String args[]) throws IOException {
		Analyzer whitespaceAnalyzer = new WhitespaceAnalyzer();

		List<String> tokens = getTokens("Hello, How Are you\t\tI am Fine \n Thank you", null, whitespaceAnalyzer);

		for (String token : tokens) {
			System.out.println(token);
		}

	}

}

 

Output

Hello,
How
Are
you
I
am
Fine
Thank
you

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment