Detokenizing
is the reverse process of tokenization. Detokenization constructs original
non-tokenized string out of a token sequence.
Following
rules are applied while forming a sentence using tokens.
Rule
|
Description
|
MOVE_BOTH
|
Attaches
the token to the token on the left and right sides.
|
MOVE_LEFT
|
Attaches
the token to the token on the left side.
|
MOVE_RIGHT
|
Attaches
the token to the token on the right side.
|
RIGHT_LEFT_MATCHING
|
Attaches
the token token to the right token on first occurrence, and to the token on
the left side on the second occurrence.
|
import opennlp.tools.tokenize.DetokenizationDictionary; import opennlp.tools.tokenize.DetokenizationDictionary.Operation; import opennlp.tools.tokenize.DictionaryDetokenizer; public class DeTokenizerUtil { public static String deTokenize(String[] tokens, DetokenizationDictionary.Operation operation) { Operation[] operations = new Operation[tokens.length]; for (int i = 0; i < tokens.length; i++) { operations[i] = operation; } DetokenizationDictionary dictionary = new DetokenizationDictionary( tokens, operations); DictionaryDetokenizer detokenizer = new DictionaryDetokenizer( dictionary); return detokenizer.detokenize(tokens, " "); } }
import java.io.IOException; import opennlp.tools.tokenize.DetokenizationDictionary; public class Main { public static void main(String args[]) throws IOException { String tokens[] = { "We", "are", "living", "in", "an", "Environment", ",", "where", "multiple", "Hardware", "Architectures", "and", "Multiple", "platforms", "presents", ".", "So", "it", "is", "very", "difficult", "to", "write", ",", "compile", "and", "link", "the", "same", "Application", ",", "for", "each", "platform", "and", "each", "Architecture", "separately", ".", "The", "Java", "Programming", "Language", "solves", "all", "the", "above", "problems", ".", "The", "Java", "programming", "language", "platform", "provides", "a", "portable", ",", "interpreted", ",", "high-performance", ",", "simple", ",", "object-oriented", "programming", "language", "and", "supporting", "run-time", "environment", "." }; String data = DeTokenizerUtil.deTokenize(tokens, DetokenizationDictionary.Operation.MOVE_LEFT); System.out.println(data); } }
Output
We are living in an Environment , where multiple Hardware Architectures and Multiple platforms presents . So it is very difficult to write , compile and link the same Application , for each platform and each Architecture separately . The Java Programming Language solves all the above problems . The Java programming language platform provides a portable , interpreted , high-performance , simple , object-oriented programming language and supporting run-time environment .
No comments:
Post a Comment