‘job.setNumReduceTasks(noOfReduceTasks)’ method is used to configure number of reducers to a MapReduce job.
Example
job.setNumReduceTasks(3)
Above snippet configures three reducers. You can even set the number of reducers by passing below jvm argument while launching the job.
-Dmapred.reduce.tasks=3
Why to customize the number of reducers?
Let me explain with an example. Suppose you have a job, that runs on huge files with the help of 1000 mapper computes. In this case, the output generated by these mappers are huge, and if you want to aggregate the final output in one reducer machine, it is not performance efficient. In this scenario, we can achieve better throughput by configuring more than one reducer.
What is the default number of reducers to a Hadoop job?
One.
Reference
https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)Previous Next Home
No comments:
Post a Comment