In this
post, I am going to explain, how to detect a sentence using OpenNLP.
Approach 1: Using opennlp command line interface.
OpenNLP
provides number of trained models for tokenization, sentence detection, name
finding etc.,
Download ‘en-sent.bin’
model from following link.
Following
command is used to extract all sentences from given document.
opennlp
SentenceDetector en-sent.bin < input.txt > output.txt
Lets say ‘input.txt’ contains following data.
We are living in an Environment, where multiple Hardware Architectures and Multiple platforms presents. So it is very difficult to write, compile and link the same Application, for each platform and each Architecture separately. The Java Programming Language solves all the above problems. The Java programming language platform provides a portable, interpreted, high-performance, simple, object-oriented programming language and supporting run-time environment.
Following
command parse input.txt and write all sentences to output.txt file.
‘opennlp SentenceDetector ./en-sent.bin <
./input.txt > output.txt’.
$ opennlp SentenceDetector ./en-sent.bin < ./input.txt > output.txt Loading Sentence Detector model ... done (0.037s) Average: 1333.3 sent/s Total: 4 sent Runtime: 0.003s $ cat output.txt We are living in an Environment, where multiple Hardware Architectures and Multiple platforms presents. So it is very difficult to write, compile and link the same Application, for each platform and each Architecture separately. The Java Programming Language solves all the above problems. The Java programming language platform provides a portable, interpreted, high-performance, simple, object-oriented programming language and supporting run-time environment.
Approach 2: Using Java API
a. Load
sentence model.
initSentenceModel(String file) { InputStream modelIn; try { modelIn = new FileInputStream(file); } catch (FileNotFoundException e) { System.out.println(e.getMessage()); return null; } try { model = new SentenceModel(modelIn); } catch (IOException e) { e.printStackTrace(); } finally { if (modelIn != null) { try { modelIn.close(); } catch (IOException e) { } } } return model; }
b. Initialize
SentenceDetectorME
SentenceDetectorME
sentenceDetector = new SentenceDetectorME(model);
c. Use ‘sentDetect’
method to get sentences.
String
sentences[] = sentenceDetector.sentDetect("string of information");
Following is
the complete working application.
import static java.nio.file.Files.readAllBytes; import static java.nio.file.Paths.get; import java.io.IOException; import java.util.Objects; public class FileUtils { /** * Get file data as string * * @param fileName * @return */ public static String getFileDataAsString(String fileName) { Objects.nonNull(fileName); try { String data = new String(readAllBytes(get(fileName))); return data; } catch (IOException e) { System.out.println(e.getMessage()); return null; } } }
import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStream; import java.util.Objects; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; public class SentenceDetectorUtil { private SentenceModel model = null; SentenceDetectorME sentenceDetector = null; public SentenceDetectorUtil(String modelFile) { Objects.nonNull(modelFile); initSentenceModel(modelFile); initSentenceDetectorME(); } private void initSentenceDetectorME() { sentenceDetector = new SentenceDetectorME(model); } private SentenceModel initSentenceModel(String file) { InputStream modelIn; try { modelIn = new FileInputStream(file); } catch (FileNotFoundException e) { System.out.println(e.getMessage()); return null; } try { model = new SentenceModel(modelIn); } catch (IOException e) { e.printStackTrace(); } finally { if (modelIn != null) { try { modelIn.close(); } catch (IOException e) { } } } return model; } public String[] getSentencesFromFile(String inputFile) { String data = FileUtils.getFileDataAsString(inputFile); return sentenceDetector.sentDetect(data); } public String[] getSentences(String data) { return sentenceDetector.sentDetect(data); } }
public class Main { public static void main(String args[]) { SentenceDetectorUtil util = new SentenceDetectorUtil( "/Users/harikrishna_gurram/study1/OpenNLP/apache-opennlp-1.6.0/bin/models/en-sent.bin"); String data = "We are living in an Environment, where multiple Hardware Architectures and Multiple platforms presents. So it is very difficult to write, compile and link the same Application, for each platform and each Architecture separately. The Java Programming Language solves all the above problems. The Java programming language platform provides a portable, interpreted, high-performance, simple, object-oriented programming language and supporting run-time environment."; String[] sentences = util.getSentences(data); for (String s : sentences) System.out.println(s +"\n"); } }
Output
We are living in an Environment, where multiple Hardware Architectures and Multiple platforms presents. So it is very difficult to write, compile and link the same Application, for each platform and each Architecture separately. The Java Programming Language solves all the above problems. The Java programming language platform provides a portable, interpreted, high-performance, simple, object-oriented programming language and supporting run-time environment.
No comments:
Post a Comment