Sorting is a phase that is done on Reducer machine, where all the <key, value> pairs are sorted and group by key before reducer start processing.
What is the advantage of sorting phase?
Since sorting phase sort and groups reducer inputs by keys, it is easy for the reducer to perform aggregate operations.
Before Sorting
(day, 2) (good, 2) (there, 1) (the, 1) (there, 2) (good, 3)
After sorting above data is transformed like below.
(day, [2]) (good, [2, 3]) (the, [1]) (there, [1, 1])
What is the order of mapper, partition, shuffling, sorting and reducer phases?
Mapper -> Partitioner -> Shuffle -> Sort -> Reducer
Is the sorting phase done on reducer machine?
Yes
Previous Next Home
No comments:
Post a Comment