Tuesday, 29 March 2022

Hadoop: How to configure number of reducers to a MapReduce job?

‘job.setNumReduceTasks(noOfReduceTasks)’ method is used to configure number of reducers to a MapReduce job.

 

Example

job.setNumReduceTasks(3)

 

Above snippet configures three reducers. You can even set the number of reducers by passing below jvm argument while launching the job.

-Dmapred.reduce.tasks=3

 


Why to customize the number of reducers?

Let me explain with an example. Suppose you have a job, that runs on huge files with the help of 1000 mapper computes. In this case, the output generated by these mappers are huge, and if you want to aggregate the final output in one reducer machine, it is not performance efficient. In this scenario, we can achieve better throughput by configuring more than one reducer.

 

What is the default number of reducers to a Hadoop job?

One.

 

 

 

 

 

 

 

Reference

https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment