Thursday, 24 March 2022

Hadoop: fsck: print hdfc file blocks, and their location

‘hdfs fsck’ command is used to get the file blocks and their locations report.

 

Help document for fsck

$hdfs fsck
Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-maintenance]
	<path>	start checking from this path
	-move	move corrupted files to /lost+found
	-delete	delete corrupted files
	-files	print out files being checked
	-openforwrite	print out files opened for write
	-includeSnapshots	include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
	-list-corruptfileblocks	print out list of missing blocks and files they belong to
	-blocks	print out block report
	-locations	print out locations for every block
	-racks	print out network topology for data-node locations

	-maintenance	print out maintenance state node details
	-blockId	print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

Please Note:
	1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
	2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots.

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

$

How to get the file blocks and their location report?

hdfs fsck /user/cloudera/demo1/atlas.docx -files -blocks -locations

 

Above command print the blocks and their location of the file /user/cloudera/demo1/atlas.docx.

$hdfs fsck /user/cloudera/demo1/atlas.docx -files -blocks -locations
Connecting to namenode via http://quickstart.cloudera:50070/fsck?ugi=cloudera&files=1&blocks=1&locations=1&path=%2Fuser%2Fcloudera%2Fdemo1%2Fatlas.docx
FSCK started by cloudera (auth:SIMPLE) from /10.0.2.15 for path /user/cloudera/demo1/atlas.docx at Wed Mar 23 23:54:46 PDT 2022
/user/cloudera/demo1/atlas.docx 6166736 bytes, 1 block(s):  OK
0. BP-1067413441-127.0.0.1-1508775264580:blk_1073742767_1943 len=6166736 Live_repl=1 [DatanodeInfoWithStorage[10.0.2.15:50010,DS-621c9e78-caa3-4a7b-bf10-3c8a1245cb51,DISK]]

Status: HEALTHY
 Total size:	6166736 B
 Total dirs:	0
 Total files:	1
 Total symlinks:		0
 Total blocks (validated):	1 (avg. block size 6166736 B)
 Minimally replicated blocks:	1 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	1
 Average block replication:	1.0
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		1
 Number of racks:		1
FSCK ended at Wed Mar 23 23:54:46 PDT 2022 in 1 milliseconds


The filesystem under path '/user/cloudera/demo1/atlas.docx' is HEALTHY

 


 

 

 

Previous                                                 Next                                                 Home

No comments:

Post a Comment