Thursday, 22 October 2015

Elasticsearch: custom mapping

Each document in elastic search stored in a type, which in turn stored in index. Each type has its own schema definition, where fields are mapped to corresponding data types.

Core types in elastic search are
1.      String
2.      Number
3.      Boolean
4.      Date
5.      Binary

When you index a document, elastic search determine proper data type for the field and assign it to the document.

Is dynamic mapping always good?
Not always, let’s say I had a salary table (where employee ids mapped to their salaries). I want to insert same data to elastic search.

Id
Salary
1
25000
2
38000.98
3
43000.29
4
68000


Let me insert first document into type salary like below.
PUT organization/salary/1
{
  "id" : 1,
  "salary" : 25000
}


Response like below.
{
       "_index": "organization",
       "_type": "salary",
       "_id": "1",
       "_version": 1,
       "created": true
}
Now check the mapping for type salary.
GET organization/_mapping/salary

You will get following response.

{
   "organization": {
      "mappings": {
         "salary": {
            "properties": {
               "id": {
                  "type": "long"
               },
               "salary": {
                  "type": "long"
               }
            }
         }
      }
   }
}


As you observe “id” and “salary” are mapped to data type long. What if you want map salary to double, you can do by using custom mapping.

Customize field mappings
We can specify mappings for a type, while creating the index itself.

For example, I want to create an index xyz, which has type employees, products.

employees
firstName : string (not_analyzed)
lastName : string (not_analyzed)
age : int
dateOfBirth: Date
description : string (analyzed)

products
id : string (not_analyzed)
noOfProductsAvailable : int
description : string (analyzed)

PUT /xyz
{
  "mappings": {
    "employees" :{
      "properties" : {
        "firstName" :{
          "type" : "string",
          "index" : "not_analyzed"
        },
        "lastName" :{
          "type" : "string",
          "index" : "not_analyzed"
        },
        "age" :{
          "type" : "integer"
        },
        "dateOfBirth" :{
          "type" : "date"
        },
        "description" :{
          "type" : "string"
        }
      }
    },
    "products" : {
      "properties" :{
        "id" :{
          "type" : "string",
          "index" : "not_analyzed"
        },
        "noOfProductsAvailable" :{
          "type" : "integer"
        },
        "description" :{
          "type" : "string"
        }
      }
    }
  }
}


Get the mappings from types products, employees in index xyz.
GET /xyz/_mappings

You will get following output

{
   "xyz": {
      "mappings": {
         "employees": {
            "properties": {
               "age": {
                  "type": "integer"
               },
               "dateOfBirth": {
                  "type": "date",
                  "format": "dateOptionalTime"
               },
               "description": {
                  "type": "string"
               },
               "firstName": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "lastName": {
                  "type": "string",
                  "index": "not_analyzed"
               }
            }
         },
         "products": {
            "properties": {
               "description": {
                  "type": "string"
               },
               "id": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "noOfProductsAvailable": {
                  "type": "integer"
               }
            }
         }
      }
   }
}


Update mapping
Suppose I want to add new field joiningDate to the type employees, how can I do that, it is very simple use PUT request, _mapping endpoint like below.

PUT /xyz/_mapping/employees
{
  "properties" :{
    "joiningDate" :{
      "type" : "date"
    }
  }
}


Now get the mappings for type employees.
GET /xyz/_mapping/employees

You will get following response.

{
   "xyz": {
      "mappings": {
         "employees": {
            "properties": {
               "age": {
                  "type": "integer"
               },
               "dateOfBirth": {
                  "type": "date",
                  "format": "dateOptionalTime"
               },
               "description": {
                  "type": "string"
               },
               "firstName": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "joiningDate": {
                  "type": "date",
                  "format": "dateOptionalTime"
               },
               "lastName": {
                  "type": "string",
                  "index": "not_analyzed"
               }
            }
         }
      }
   }
}


Test mappings
GET /xyz/_analyze?field=firstName
{
  hari krishna gurram
}

Since firstName is not analyzed field, you will get one token in the response.

{
   "tokens": [
      {
         "token": "{\n  hari krishna gurram\n}\n",
         "start_offset": 0,
         "end_offset": 26,
         "type": "word",
         "position": 1
      }
   ]
}


GET /xyz/_analyze?field=description
{
  hari krishna gurram
}


Since description is analyzed field, you will get following response.

{
   "tokens": [
      {
         "token": "hari",
         "start_offset": 4,
         "end_offset": 8,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "krishna",
         "start_offset": 9,
         "end_offset": 16,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "gurram",
         "start_offset": 17,
         "end_offset": 23,
         "type": "<ALPHANUM>",
         "position": 3
      }
   ]
}


Mapping for Inner Objects
Suppose we want to create mapping for following kind of employee.

{
  "id" : "20",
  "name" : {
    "firstName" : "Hari",
    "middleName" : "Krishna",
    "lastName" : "Gurram"
  } 
}


As you observe “name” is inner object inside employee object.

First delete the mapping associated with employee.
DELETE /organization/_mapping/employee

PUT /organization/_mapping/employee
{
  "properties": {
    "id" : {
      "type": "integer"
    },
    "name" :{
      "type" : "object", 
      "properties": {
        "firstName" : {"type" : "string"},
        "middleName" : {"type" : "string"},
        "lastName" : {"type" : "string"}
      }
    }
  }
}


Since “name” is of type object, I specified its type as object while mapping.

Get mapping for organization.
GET /organization/_mapping/employee

You will get following response.

{
   "organization": {
      "mappings": {
         "employee": {
            "properties": {
               "id": {
                  "type": "integer"
               },
               "name": {
                  "properties": {
                     "firstName": {
                        "type": "string"
                     },
                     "lastName": {
                        "type": "string"
                     },
                     "middleName": {
                        "type": "string"
                     }
                  }
               }
            }
         }
      }
   }
}


How Inner fields are referenced?
Inner fields are referenced by dot notation. For example, we can refer firstName using ‘name.firstName’, lastName using ‘name.lastName’, middlename using ‘name.middleName’.

Note
a. index field
By default, String type data is passed through analyzer before being indexed. If you don’t want to string to be analyzed you can make it as ‘not_analyzed’.

{
    "description": {
        "type":     "string",
        "index":    "not_analyzed"
    }
}

“index” attribute controls how string will be indexed. It can contains one of three values.

Value
Description
Analyzed
Analyze the field before indexing.
not_analyzed
Index this field and don’t analyze it.
No
Don’t index this field, so it is not searchable.

2. analyzer field
Elasticsearch come up with number of built in analyzers like Standard Analyzer, Simple Analyzer, Whitespace Analyzer, Stop Analyzer, Keyword Analyzer, Pattern Analyzer, Language Analyzers, Snowball Analyzer, Custom Analyzer. You can specify which analyzer to use by using ‘analyzer’ field.

{
    "description": {
        "type":     "string",
        "analyzer": "english"
    }
}



Prevoius                                                 Next                                                 Home

No comments:

Post a Comment