Wednesday, 30 March 2022

What is the role of partitioning in Hadoop MapReduce?

Partitioning is the process of splitting the mapper outputs (<key, value> pairs) consistently such that, all the values of a key ‘k1’ goes to the same reducer always.

 

Let me explain it with an example. My organization ‘ABCCorp’ is located in two cities Bangalore, Chennai. Now I want to consolidate the login, logout times of employees by city. To perform this task, I created a MapReduce job with 3 mappers and 2 reducers (reducer1 to handle Bangalore employees data and reducer2 to handle Chennai employees data).

 


But the catch here is, How to transfer <key, value> pairs of Bangalore city to Reducer1 and <key, value> pairs of Chennai city to Reducer2. Here the partitioner to address this use case. Partitioner takes the mapper output and determine the reducer instance that this <key, value> pair transferred to.  


How to implement custom partitioner?

You can implement a custom partitioner by extending ‘Partitioner’ class.

public static class PatitionerByCity extends Partitioner<Text, Text> {

  @Override
  public int getPartition(Text key, Text value, int numPartitions) {
   /*
    * Send all Bangalore employees data to reducer1, and all other data to reducer2
    */
   if (key.toString().equalsIgnoreCase("Bangalore")) {
    return 0 % numPartitions;
   } else {
    return 1 % numPartitions;
   }
  }

 }

 

You can set this partitioner to the job using setPartitionerClass method.

job.setPartitionerClass(PatitionerByCity.class);

 

You can find complete working application here.

https://self-learning-java-tutorial.blogspot.com/2016/01/hadoop-partitioner.html

 

What is the default partitioner in Hadoop?

‘Hash Partitioner’ is the default one, it assign the partitions based on the key hash value.

 

Is partitioning process happen on mapper compute?

Yes, Partitioning always happen on mapper compute.

 

What is the order of mapper, partition and reducer processes?

Partitioning phase execute after the map phase, and before the reducer phase.


 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment