Each document in elastic search stored in a type, which in turn stored
in index. Each type has its own schema definition, where fields are mapped to
corresponding data types.
Core types
in elastic search are
1. String
2. Number
3. Boolean
4. Date
5. Binary
When you
index a document, elastic search determine
proper data type for the field and assign it to the document.
Is dynamic mapping always good?
Not always, let’s say I had a salary table (where employee ids mapped to their
salaries). I want to insert same data to elastic search.
Id
|
Salary
|
1
|
25000
|
2
|
38000.98
|
3
|
43000.29
|
4
|
68000
|
Let me
insert first document into type salary like below.
PUT organization/salary/1 { "id" : 1, "salary" : 25000 }
Response
like below.
{ "_index": "organization", "_type": "salary", "_id": "1", "_version": 1, "created": true }
GET organization/_mapping/salary
You will get
following response.
{ "organization": { "mappings": { "salary": { "properties": { "id": { "type": "long" }, "salary": { "type": "long" } } } } } }
As you
observe “id” and “salary” are mapped to data type long. What if you want map salary to double, you can do by using custom mapping.
Customize field mappings
We can
specify mappings for a type, while creating the index itself.
For example,
I want to create an index xyz, which has type employees, products.
employees
firstName :
string (not_analyzed)
lastName :
string (not_analyzed)
age : int
dateOfBirth:
Date
description
: string (analyzed)
products
id : string
(not_analyzed)
noOfProductsAvailable
: int
description
: string (analyzed)
PUT /xyz { "mappings": { "employees" :{ "properties" : { "firstName" :{ "type" : "string", "index" : "not_analyzed" }, "lastName" :{ "type" : "string", "index" : "not_analyzed" }, "age" :{ "type" : "integer" }, "dateOfBirth" :{ "type" : "date" }, "description" :{ "type" : "string" } } }, "products" : { "properties" :{ "id" :{ "type" : "string", "index" : "not_analyzed" }, "noOfProductsAvailable" :{ "type" : "integer" }, "description" :{ "type" : "string" } } } } }
Get the
mappings from types products, employees in index xyz.
GET /xyz/_mappings
You will get
following output
{ "xyz": { "mappings": { "employees": { "properties": { "age": { "type": "integer" }, "dateOfBirth": { "type": "date", "format": "dateOptionalTime" }, "description": { "type": "string" }, "firstName": { "type": "string", "index": "not_analyzed" }, "lastName": { "type": "string", "index": "not_analyzed" } } }, "products": { "properties": { "description": { "type": "string" }, "id": { "type": "string", "index": "not_analyzed" }, "noOfProductsAvailable": { "type": "integer" } } } } } }
Update mapping
Suppose I
want to add new field joiningDate to the type employees, how can I do that, it
is very simple use PUT request, _mapping endpoint like below.
PUT /xyz/_mapping/employees { "properties" :{ "joiningDate" :{ "type" : "date" } } }
Now get the
mappings for type employees.
GET /xyz/_mapping/employees
You will get
following response.
{ "xyz": { "mappings": { "employees": { "properties": { "age": { "type": "integer" }, "dateOfBirth": { "type": "date", "format": "dateOptionalTime" }, "description": { "type": "string" }, "firstName": { "type": "string", "index": "not_analyzed" }, "joiningDate": { "type": "date", "format": "dateOptionalTime" }, "lastName": { "type": "string", "index": "not_analyzed" } } } } } }
Test mappings
GET /xyz/_analyze?field=firstName
{
hari krishna gurram
}
Since
firstName is not analyzed field, you will get one token in the response.
{ "tokens": [ { "token": "{\n hari krishna gurram\n}\n", "start_offset": 0, "end_offset": 26, "type": "word", "position": 1 } ] }
GET /xyz/_analyze?field=description { hari krishna gurram }
Since
description is analyzed field, you will get following response.
{ "tokens": [ { "token": "hari", "start_offset": 4, "end_offset": 8, "type": "<ALPHANUM>", "position": 1 }, { "token": "krishna", "start_offset": 9, "end_offset": 16, "type": "<ALPHANUM>", "position": 2 }, { "token": "gurram", "start_offset": 17, "end_offset": 23, "type": "<ALPHANUM>", "position": 3 } ] }
Mapping for Inner Objects
Suppose we
want to create mapping for following kind of employee.
{ "id" : "20", "name" : { "firstName" : "Hari", "middleName" : "Krishna", "lastName" : "Gurram" } }
As you
observe “name” is inner object inside employee object.
First delete
the mapping associated with employee.
DELETE
/organization/_mapping/employee
PUT /organization/_mapping/employee { "properties": { "id" : { "type": "integer" }, "name" :{ "type" : "object", "properties": { "firstName" : {"type" : "string"}, "middleName" : {"type" : "string"}, "lastName" : {"type" : "string"} } } } }
Since “name”
is of type object, I specified its type as object while mapping.
Get mapping
for organization.
GET
/organization/_mapping/employee
You will get
following response.
{ "organization": { "mappings": { "employee": { "properties": { "id": { "type": "integer" }, "name": { "properties": { "firstName": { "type": "string" }, "lastName": { "type": "string" }, "middleName": { "type": "string" } } } } } } } }
How Inner fields are referenced?
Inner fields
are referenced by dot notation. For example, we can refer firstName using
‘name.firstName’, lastName using ‘name.lastName’, middlename using
‘name.middleName’.
Note
a. index field
By default, String
type data is passed through analyzer before being indexed. If you don’t want to
string to be analyzed you can make it as ‘not_analyzed’.
{
"description": {
"type": "string",
"index": "not_analyzed"
}
}
“index”
attribute controls how string will be indexed. It can contains one of three
values.
Value
|
Description
|
Analyzed
|
Analyze
the field before indexing.
|
not_analyzed
|
Index this
field and don’t analyze it.
|
No
|
Don’t
index this field, so it is not searchable.
|
2. analyzer field
Elasticsearch come up with number of built in analyzers like Standard Analyzer, Simple Analyzer,
Whitespace Analyzer, Stop Analyzer, Keyword Analyzer, Pattern Analyzer,
Language Analyzers, Snowball Analyzer, Custom Analyzer. You can specify
which analyzer to use by using ‘analyzer’ field.
{
"description": {
"type": "string",
"analyzer":
"english"
}
}
No comments:
Post a Comment