Saturday 21 November 2015

Elasticsearch: Synonyms handling in elastic search


Synonyms can be handled using synonym token filer.
PUT /sample
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [ 
            "today, present, latest, now",
            "sleep, rest, coma"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter" 
          ]
        }
      }
    }
  }
}
POST /sample/_analyze?analyzer=my_synonyms
{"feeling sleepy today"}

You will get following response.
{
   "tokens": [
      {
         "token": "feeling",
         "start_offset": 2,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "sleepy",
         "start_offset": 10,
         "end_offset": 16,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "today",
         "start_offset": 17,
         "end_offset": 22,
         "type": "SYNONYM",
         "position": 3
      },
      {
         "token": "present",
         "start_offset": 17,
         "end_offset": 22,
         "type": "SYNONYM",
         "position": 3
      },
      {
         "token": "latest",
         "start_offset": 17,
         "end_offset": 22,
         "type": "SYNONYM",
         "position": 3
      },
      {
         "token": "now",
         "start_offset": 17,
         "end_offset": 22,
         "type": "SYNONYM",
         "position": 3
      }
   ]
}


As you observe the response, all the synonyms occupy the same position.
PUT /sample/_mapping/blog
{
  "properties" : {
   "title" :{
    "type" : "string",
    "analyzer" : "my_synonyms"
   } 
  }
}

POST /sample/blog/1
{
  "title" : "feeling sleepy today"
}


As we used "my_synonyms" analyzer for the field “title”, so you can query on synonyms also.
POST /sample/blog/_search
{
 "query" :{
  "term" : {"title" : "rest"}
 }
}


Response like below.
{
       "took": 400,
       "timed_out": false,
       "_shards":
       {
           "total": 5,
           "successful": 5,
           "failed": 0
       },
       "hits":
       {
           "total": 1,
           "max_score": 0.15342641,
           "hits":
           [
               {
                   "_index": "sample",
                   "_type": "blog",
                   "_id": "1",
                   "_score": 0.15342641,
                   "_source":
                   {
                       "title": "feeling sleep today"
                   }
               }
           ]
       }
    }


Actually document doesn’t contain the word “rest”, but it is a synonym for word “sleep”, so search returns document 1.

You can use synonym filter at index time (or) at query time. Using at both places is redundant.

If you use, synonym filter at query time, then the query for “rest” converted to “sleep” or “rest” or “coma".




Prevoius                                                 Next                                                 Home

No comments:

Post a Comment