Monday, 2 November 2015

Elasticsearch: term filter


“term” filter is used to query on exact values. We see it by using below example. Below snippet puts five records of employee into index “organization” and type “employee”.

PUT /_bulk
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 1}}
{"id":358,"firstName":"Hari Krishna","lastName":"Gurram","designation":"Senior Software Engineer","mailId":"abcdefabcdef@abcdef.com","hobbies":["watching movies","Writing blogs","Reading philosophy"],"address":{"street":"Panchayat Office","city":"Ongole","state":"Andhra Pradesh","country":"India","PIN":"523169"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 2}}
{"id":12345,"firstName":"Joel Babu","lastName":"Chelli","designation":"Software Engineer","mailId":"wxyzwxyz@wxyz.com","hobbies":["Playing games","shopping","watch movies"],"address":{"street":"Marthalli","city":"Bangalore","state":"Karnataka","country":"India","PIN":"560047"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 3}}
{"id":765,"firstName":"Gopi","lastName":"battu","designation":"Technology specialist","mailId":"abc12345@wxyz.com","hobbies":["seeing places","watching movies","chat with friends"],"address":{"street":"Marthalli","city":"Bangalore","state":"Karnataka","country":"India","PIN":"560047"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 4}}
{"id":75,"firstName":"Rama Krishna","lastName":"Gurram","designation":"Tech lead","mailId":"asdfgh@wxyz.com","hobbies":["Reading editorial news","climbing hills","chat with friends"],"address":{"street":"Marthalli","city":"Bangalore","state":"Karnataka","country":"India","PIN":"560047"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 5}}
{"id":12,"firstName":"Sailaja","lastName":"Navakotla","designation":"Software Engineer","mailId":"wxyasdf@wxyz.com","hobbies":["climbing hills","shopping","travelling"],"address":{"street":"TNagar","city":"Chennai","state":"Tamilnadu","country":"India","PIN":"5609126"}}

1. Get employee with id 12345
If you are from RDBMS background, then query looks like below.

SELECT * FROM employee WHERE id=12345;

Same thing is written using term filter like below.

GET /organization/employee/_search
{
  "query" :{
    "filtered": {
      "filter": {"term": {
        "id": "12345"
      }}
    }
  }
}


Above query returns following result.

{
   "took": 13,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "organization",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "id": 12345,
               "firstName": "Joel Babu",
               "lastName": "Chelli",
               "designation": "Software Engineer",
               "mailId": "wxyzwxyz@wxyz.com",
               "hobbies": [
                  "Playing games",
                  "shopping",
                  "watch movies"
               ],
               "address": {
                  "street": "Marthalli",
                  "city": "Bangalore",
                  "state": "Karnataka",
                  "country": "India",
                  "PIN": "560047"
               }
            }
         }
      ]
   }
}


Search API takes everything as a query, so we wrapped our “term” filter with “query”. “filtered” queries  takes both queries and filters together. We will see how to use queries and filters together in queries section

2. Get employee where designation is “Software Engineer”.

GET /organization/employee/_search
{
  "query" :{
    "filtered": {
      "filter": {"term": {
        "designation": "Software Engineer"
      }}
    }
  }
}
Above query returns nothing. Surprised……. We had documents with id’s 2 and 5 where designation is “Software Engineer”. But we didn’t get the results.

This is because as I said, elastic search analyze the text data by default using standard analyzer.

POST /_analyze?analyzer=standard
{
  "software engineer"
}


Response like below.

{
   "tokens": [
      {
         "token": "software",
         "start_offset": 5,
         "end_offset": 13,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "engineer",
         "start_offset": 14,
         "end_offset": 22,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}
As you observe String “Software Engineer” is divided into two token and tokens are lower cased like “software”, “engineer”. String “Software Engineer” is stored into index as two tokens like “software”, “engineer”.

Now you search for only “engineer”, it will give you 3 documents. Since strings “Software Engineer”, "Senior Software Engineer" contains the term “engineer”

GET /organization/employee/_search
{
  "query" :{
    "filtered": {
      "filter": {"term" :{
        "designation" : "engineer"
      }}
    }
  }
}


You will get following response.

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "organization",
            "_type": "employee",
            "_id": "5",
            "_score": 1,
            "_source": {
               "id": 12,
               "firstName": "Sailaja",
               "lastName": "Navakotla",
               "designation": "Software Engineer",
               "mailId": "wxyasdf@wxyz.com",
               "hobbies": [
                  "climbing hills",
                  "shopping",
                  "travelling"
               ],
               "address": {
                  "street": "TNagar",
                  "city": "Chennai",
                  "state": "Tamilnadu",
                  "country": "India",
                  "PIN": "5609126"
               }
            }
         },
         {
            "_index": "organization",
            "_type": "employee",
            "_id": "1",
            "_score": 1,
            "_source": {
               "id": 358,
               "firstName": "Hari Krishna",
               "lastName": "Gurram",
               "designation": "Senior Software Engineer",
               "mailId": "abcdefabcdef@abcdef.com",
               "hobbies": [
                  "watching movies",
                  "Writing blogs",
                  "Reading philosophy"
               ],
               "address": {
                  "street": "Panchayat Office",
                  "city": "Ongole",
                  "state": "Andhra Pradesh",
                  "country": "India",
                  "PIN": "523169"
               }
            }
         },
         {
            "_index": "organization",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "id": 12345,
               "firstName": "Joel Babu",
               "lastName": "Chelli",
               "designation": "Software Engineer",
               "mailId": "wxyzwxyz@wxyz.com",
               "hobbies": [
                  "Playing games",
                  "shopping",
                  "watch movies"
               ],
               "address": {
                  "street": "Marthalli",
                  "city": "Bangalore",
                  "state": "Karnataka",
                  "country": "India",
                  "PIN": "560047"
               }
            }
         }
      ]
   }
}
But I want  all employees whose designation is ‘software engineer”.

It is simple, just tell elastic search don’t analyze the field designation. To do this, we need to provide mapping for the index.

1.First delete existing type.
DELETE /organization/employee/

2. Add custom mapping

PUT /organization/_mapping/employee/
{
  "properties" : {"designation" : {"type" : "string", "index" : "not_analyzed"}}
}
Above snippet creates mapping for the type “employee”.

"designation" : {"type" : "string", "index" : "not_analyzed"}

Above statement tells elastic search, don’t analyze the field designation.

3.Reindex the data to type “employee”

PUT /_bulk
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 1}}
{"id":358,"firstName":"Hari Krishna","lastName":"Gurram","designation":"Senior Software Engineer","mailId":"abcdefabcdef@abcdef.com","hobbies":["watching movies","Writing blogs","Reading philosophy"],"address":{"street":"Panchayat Office","city":"Ongole","state":"Andhra Pradesh","country":"India","PIN":"523169"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 2}}
{"id":12345,"firstName":"Joel Babu","lastName":"Chelli","designation":"Software Engineer","mailId":"wxyzwxyz@wxyz.com","hobbies":["Playing games","shopping","watch movies"],"address":{"street":"Marthalli","city":"Bangalore","state":"Karnataka","country":"India","PIN":"560047"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 3}}
{"id":765,"firstName":"Gopi","lastName":"battu","designation":"Technology specialist","mailId":"abc12345@wxyz.com","hobbies":["seeing places","watching movies","chat with friends"],"address":{"street":"Marthalli","city":"Bangalore","state":"Karnataka","country":"India","PIN":"560047"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 4}}
{"id":75,"firstName":"Rama Krishna","lastName":"Gurram","designation":"Tech lead","mailId":"asdfgh@wxyz.com","hobbies":["Reading editorial news","climbing hills","chat with friends"],"address":{"street":"Marthalli","city":"Bangalore","state":"Karnataka","country":"India","PIN":"560047"}}
{"create" : {"_index" : "organization", "_type" : "employee", "_id" : 5}}
{"id":12,"firstName":"Sailaja","lastName":"Navakotla","designation":"Software Engineer","mailId":"wxyasdf@wxyz.com","hobbies":["climbing hills","shopping","travelling"],"address":{"street":"TNagar","city":"Chennai","state":"Tamilnadu","country":"India","PIN":"5609126"}}


4. Run the search query now for designation “Software Engineer”.

GET /organization/employee/_search
{
  "query":{
    "filtered": {
      "filter": {"term": {
        "designation": "Software Engineer"
      }}
    }
  }
}


You will get following response.

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "organization",
            "_type": "employee",
            "_id": "5",
            "_score": 1,
            "_source": {
               "id": 12,
               "firstName": "Sailaja",
               "lastName": "Navakotla",
               "designation": "Software Engineer",
               "mailId": "wxyasdf@wxyz.com",
               "hobbies": [
                  "climbing hills",
                  "shopping",
                  "travelling"
               ],
               "address": {
                  "street": "TNagar",
                  "city": "Chennai",
                  "state": "Tamilnadu",
                  "country": "India",
                  "PIN": "5609126"
               }
            }
         },
         {
            "_index": "organization",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "id": 12345,
               "firstName": "Joel Babu",
               "lastName": "Chelli",
               "designation": "Software Engineer",
               "mailId": "wxyzwxyz@wxyz.com",
               "hobbies": [
                  "Playing games",
                  "shopping",
                  "watch movies"
               ],
               "address": {
                  "street": "Marthalli",
                  "city": "Bangalore",
                  "state": "Karnataka",
                  "country": "India",
                  "PIN": "560047"
               }
            }
         }
      ]
   }
}


Note:
When executing a filtered query, the filter is executed before the query. We will see how to use filters and queries together in queries section.




Prevoius                                                 Next                                                 Home

No comments:

Post a Comment