Saturday, 2 January 2016

Hadoop: RecordReader

In previous post, I explained about InputSplit, InputSplit defined slice of work, but it don’t describe how to access the data. It is the RecordReader class, which actually loads the data from source and converts it into <key, value> pairs.

TextInputFormat provides LineRecordReader which treats every line in the file as a value, and line offset treated as a key. RecordReader consumes every line from the file, until InputSplit is consumed. Whenever RecordReader reads a line from InputSplit, it invokes map method of the mapper.




Previous                                                 Next                                                 Home

No comments:

Post a Comment