Wednesday, 2 December 2015

Elasticsearch: Java: Dis max query


Let’s say I am maintaining a book index, and philosophy type to store all philosophy books.

PUT /_bulk
{"create" : {"_index" : "books", "_type":"philosophy", "_id" : 1}}
{"title" : "Autobiography of Osho", "description" : "A professor of philosophy, he travelled throughout India during the 1960s as a public speaker. His outspoken criticism of politicians and the political mind, Mahatma Gandhi and institutionalised religion made him controversial."}
{"create" : {"_index" : "books", "_type":"philosophy", "_id" : 2}}
{"title" : "Osho philosophy", "description" : "Osho Autobiography is a book on philosophy. Osho travelled throughout India during the 1960s as a public speaker. Osho outspoken criticism of politicians and the political mind, Mahatma Gandhi and institutionalised religion made him controversial.Osho written many books on philosophy."}

GET /books/philosophy/_search
{
  "query":{
    "bool" :{
      "should": [
        {"match" : {"title" : "Osho Autobiography"}},
        {"match" : {"description" : "Osho Autobiography"}}
      ]
    }
  }
}

You will get following response.

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.45441824,
      "hits": [
         {
            "_index": "books",
            "_type": "philosophy",
            "_id": "2",
            "_score": 0.45441824,
            "_source": {
               "title": "Osho philosophy",
               "description": "Osho Autobiography is a book on philosophy. Osho travelled throughout India during the 1960s as a public speaker. Osho outspoken criticism of politicians and the political mind, Mahatma Gandhi and institutionalised religion made him controversial.Osho written many books on philosophy."
            }
         },
         {
            "_index": "books",
            "_type": "philosophy",
            "_id": "1",
            "_score": 0.27650413,
            "_source": {
               "title": "Autobiography of Osho",
               "description": "A professor of philosophy, he travelled throughout India during the 1960s as a public speaker. His outspoken criticism of politicians and the political mind, Mahatma Gandhi and institutionalised religion made him controversial."
            }
         }
      ]
   }
}

As you observe the results “Osho philosophy” book comes first than the book “Autobiography of Osho”. This is because “bool” query works like below.    

{
  "query" : {
    "bool" : {
      "should" : [
        {"match" : {"title" : "Osho Autobiography"}},
        {"match" : {"description" : "Osho Autobiography"}}
      ]
    }
  }
}

1.   “bool” query runs both of the queries in should clause.
2.   Adds scores together.
3.   Multiplies total by number of matching clauses.
4.   Divides the result by total number of clauses (2 clauses here).

As you observe document 2 contains the word “Osho” in both fields “title”, “description”.  But document1 don’t contain the word “Osho” in its description. So document 2 comes before document 1. But document 1 is the book on “osho autobiography”. It should come before document 2.

In the cases like above “dis_max” query is useful. It returns documents that match any of these queries, and return the score of the best matching query

GET /books/philosophy/_search
{
  "query":{
    "dis_max" :{
      "queries": [
        {"match" : {"title" : "Osho Autobiography"}},
        {"match" : {"description" : "Osho Autobiography"}}
      ]
    }
  }
}

You will get following response.    
{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.06658022,
      "hits": [
         {
            "_index": "books",
            "_type": "philosophy",
            "_id": "1",
            "_score": 0.06658022,
            "_source": {
               "title": "Autobiography of Osho",
               "description": "A professor of philosophy, he travelled throughout India during the 1960s as a public speaker. His outspoken criticism of politicians and the political mind, Mahatma Gandhi and institutionalised religion made him controversial."
            }
         },
         {
            "_index": "books",
            "_type": "philosophy",
            "_id": "2",
            "_score": 0.03842633,
            "_source": {
               "title": "Osho philosophy",
               "description": "Osho Autobiography is a book on philosophy. Osho travelled throughout India during the 1960s as a public speaker. Osho outspoken criticism of politicians and the political mind, Mahatma Gandhi and institutionalised religion made him controversial.Osho written many books on philosophy."
            }
         }
      ]
   }
}


By using Java API, you can write like below.

QueryBuilder builder = QueryBuilders
  .disMaxQuery()
  .add(QueryBuilders.matchQuery("title", "Osho Autobiography"))
  .add(QueryBuilders.matchQuery("description","Osho Autobiography"));

Above query produces following json document.    
{
  "dis_max" : {
    "queries" : [ {
      "match" : {
        "title" : {
          "query" : "Osho Autobiography",
          "type" : "boolean"
        }
      }
    }, {
      "match" : {
        "description" : {
          "query" : "Osho Autobiography",
          "type" : "boolean"
        }
      }
    } ]
  }
}

Following is the complete working application, Please note that to run following application, you require some model and utility classes, you will get these from following location.    


package com.self_learn.test;

import java.io.IOException;
import java.util.concurrent.ExecutionException;

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;

import com.self_learn.util.SearchUtil;
import com.self_learn.util.TransportClientUtil;

public class Main {
  private static String clusterName = "my_cluster_1";
  private static String _index = "books";
  private static String _type = "philosophy";

  public static void main(String args[]) throws IOException,
      InterruptedException, ExecutionException {
    Client client = TransportClientUtil.getLocalTransportClient(
        clusterName, 9300);

    QueryBuilder builder = QueryBuilders
        .disMaxQuery()
        .add(QueryBuilders.matchQuery("title", "Osho Autobiography"))
        .add(QueryBuilders.matchQuery("description",
            "Osho Autobiography"));

    System.out.println(builder);

    SearchResponse response = SearchUtil.getDocuments(client, builder,
        _index, _type);
    System.out.println(response);

    client.close();
  }
}







Previous                                                 Next                                                 Home

No comments:

Post a Comment