Thursday, 1 October 2015

Setting up OpenNLP command line interface

Prerequisite
OpenNLP requires Java. Install Java and make sure you set JAVA_HOME.

Step 1: Download OpenNLP binaries from following link.

Step 2: Extract the tar file. Set the variable ‘OPENNLP_HOME’ to root directory of OpenNLP.

I set like below.
export OPENNLP_HOME=/Users/harikrishna_gurram/OpenNLP/apache-opennlp-1.6.0

Add ‘$OPENNLP_HOME/bin’ to your system path. If you are using windows add ‘%OPENNLP_HOME%\bin’ to your system path.

Step 3: Open new terminal and type ‘opennlp’. You will get following kind of output.

$ opennlp
OpenNLP 1.6.0. Usage: opennlp TOOL
where TOOL is one of:
  Doccat                            learnable document categorizer
  DoccatTrainer                     trainer for the learnable document categorizer
  DoccatEvaluator                   Measures the performance of the Doccat model with the reference data
  DoccatCrossValidator              K-fold cross validator for the learnable Document Categorizer
  DoccatConverter                   converts leipzig data format to native OpenNLP format
  DictionaryBuilder                 builds a new dictionary
  SimpleTokenizer                   character class tokenizer
  TokenizerME                       learnable tokenizer
  TokenizerTrainer                  trainer for the learnable tokenizer
  TokenizerMEEvaluator              evaluator for the learnable tokenizer
  TokenizerCrossValidator           K-fold cross validator for the learnable tokenizer
  TokenizerConverter                converts foreign data formats (ad,pos,conllx,namefinder,parse) to native OpenNLP format
  DictionaryDetokenizer             
  SentenceDetector                  learnable sentence detector
  SentenceDetectorTrainer           trainer for the learnable sentence detector
  SentenceDetectorEvaluator         evaluator for the learnable sentence detector
  SentenceDetectorCrossValidator    K-fold cross validator for the learnable sentence detector
  SentenceDetectorConverter         converts foreign data formats (ad,pos,conllx,namefinder,parse) to native OpenNLP format
  TokenNameFinder                   learnable name finder
  TokenNameFinderTrainer            trainer for the learnable name finder
  TokenNameFinderEvaluator          Measures the performance of the NameFinder model with the reference data
  TokenNameFinderCrossValidator     K-fold cross validator for the learnable Name Finder
  TokenNameFinderConverter          converts foreign data formats (evalita,ad,conll03,bionlp2004,conll02,muc6,ontonotes,brat) to native OpenNLP format
  CensusDictionaryCreator           Converts 1990 US Census names into a dictionary
  POSTagger                         learnable part of speech tagger
  POSTaggerTrainer                  trains a model for the part-of-speech tagger
  POSTaggerEvaluator                Measures the performance of the POS tagger model with the reference data
  POSTaggerCrossValidator           K-fold cross validator for the learnable POS tagger
  POSTaggerConverter                converts foreign data formats (ad,conllx,parse,ontonotes) to native OpenNLP format
  ChunkerME                         learnable chunker
  ChunkerTrainerME                  trainer for the learnable chunker
  ChunkerEvaluator                  Measures the performance of the Chunker model with the reference data
  ChunkerCrossValidator             K-fold cross validator for the chunker
  ChunkerConverter                  converts ad data format to native OpenNLP format
  Parser                            performs full syntactic parsing
  ParserTrainer                     trains the learnable parser
  ParserEvaluator                   Measures the performance of the Parser model with the reference data
  ParserConverter                   converts foreign data formats (ontonotes,frenchtreebank) to native OpenNLP format
  BuildModelUpdater                 trains and updates the build model in a parser model
  CheckModelUpdater                 trains and updates the check model in a parser model
  TaggerModelReplacer               replaces the tagger model in a parser model
  EntityLinker                      links an entity to an external data set
All tools print help when invoked with help parameter
Example: opennlp SimpleTokenizer help

To get help for any opennlp tool, just type the command ‘opennlp toolname’.

To get help for ‘parser’ tool, type ‘opennlp Parser’.

$ opennlp Parser
Usage: opennlp Parser [-bs n -ap n -k n] model < sentences 
-bs n: Use a beam size of n.
-ap f: Advance outcomes in with at least f% of the probability mass.
-k n: Show the top n parses.  This will also display their log-probablities.



Prevoius                                                 Next                                                 Home

No comments:

Post a Comment