Thursday 19 November 2015

Elasticsearch: Configuring custom Language Analyzers


You can create custom analyzers, by taking existing analyzers as base.

PUT /blog
{
  "settings": {
    "analysis": {
      "analyzer": {
        "myCustomAnalyzer":{
          "type":"english",
          "stem_exclusion":["age", "issue"],
          "stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by", "for","if", "in", "into", "is", "it", "of", "on", "or", "such", "that","the", "their", "then", "there", "these", "they", "this", "to","was", "will", "with"]
        }
      }
    }
  }
}


Above snippet creates custom analyzer “myCustomAnalyzer”.

PUT /blog/_mapping/posts
{
  "properties":{
    "title":{
      "type" : "string",
      "analyzer" : "myCustomAnalyzer"
    }
  }
}


POST /blog/_analyze?field=title
{
  "Age is an issue of mind over matter."
}


You will get following response.

{
   "tokens": [
      {
         "token": "age",
         "start_offset": 5,
         "end_offset": 8,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "issue",
         "start_offset": 15,
         "end_offset": 20,
         "type": "<ALPHANUM>",
         "position": 4
      },
      {
         "token": "mind",
         "start_offset": 24,
         "end_offset": 28,
         "type": "<ALPHANUM>",
         "position": 6
      },
      {
         "token": "over",
         "start_offset": 29,
         "end_offset": 33,
         "type": "<ALPHANUM>",
         "position": 7
      },
      {
         "token": "matter",
         "start_offset": 34,
         "end_offset": 40,
         "type": "<ALPHANUM>",
         "position": 8
      }
   ]
}

Suppose, if title uses “English” analyzer, you will get following response.    
{
       "tokens":
       [
           {
               "token": "ag",
               "start_offset": 0,
               "end_offset": 3,
               "type": "",
               "position": 1
           },
           {
               "token": "issu",
               "start_offset": 10,
               "end_offset": 15,
               "type": "",
               "position": 4
           },
           {
               "token": "mind",
               "start_offset": 19,
               "end_offset": 23,
               "type": "",
               "position": 6
           },
           {
               "token": "over",
               "start_offset": 24,
               "end_offset": 28,
               "type": "",
               "position": 7
           },
           {
               "token": "matter",
               "start_offset": 29,
               "end_offset": 35,
               "type": "",
               "position": 8
           }
       ]
    }





Prevoius                                                 Next                                                 Home

No comments:

Post a Comment