Friday 20 November 2015

Elasticsearch: Stemming

Stemming is a process of reducing words to their root form. For example, words like fox, foxes, foxing are stemmed to root word “fox”.

GET _analyze?analyzer=english
{"fox foxes foxing"}
You will get following response.

{
   "tokens": [
      {
         "token": "fox",
         "start_offset": 2,
         "end_offset": 5,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "fox",
         "start_offset": 6,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "fox",
         "start_offset": 12,
         "end_offset": 18,
         "type": "<ALPHANUM>",
         "position": 3
      }
   ]
}


Under stemming Vs Over stemming errors
There are two kinds of errors possible in stemming.
1.   Understemming errors
2.   Overstemming errors

Understemming errors
If the two words belong to the same conceptual group are converted to different stems is called Understemming error.

Overstemming errors
If the two words belong to different conceptual groups are converted to the same stem is called Overstemming error.

Follow below link for more information




Prevoius                                                 Next                                                 Home

No comments:

Post a Comment