Programming for beginners: Hadoop: How to configure number of reducers to a MapReduce job?

Tuesday, 29 March 2022

Hadoop: How to configure number of reducers to a MapReduce job?

‘job.setNumReduceTasks(noOfReduceTasks)’ method is used to configure number of reducers to a MapReduce job.

Example

job.setNumReduceTasks(3)

Above snippet configures three reducers. You can even set the number of reducers by passing below jvm argument while launching the job.

-Dmapred.reduce.tasks=3

Why to customize the number of reducers?

Let me explain with an example. Suppose you have a job, that runs on huge files with the help of 1000 mapper computes. In this case, the output generated by these mappers are huge, and if you want to aggregate the final output in one reducer machine, it is not performance efficient. In this scenario, we can achieve better throughput by configuring more than one reducer.

What is the default number of reducers to a Hadoop job?

One.

Reference

https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)

Previous Next Home

Programming for beginners

Tuesday, 29 March 2022

Hadoop: How to configure number of reducers to a MapReduce job?

No comments:

Post a Comment