From HDFS Architecture guide
When a file
is deleted by a user or an application, it is not immediately removed from
HDFS. Instead, HDFS first renames it to a file in the /trash directory. The
file can be restored quickly as long as it remains in /trash. A file remains in
/trash for a configurable amount of time. After the expiry of its life in
/trash, the NameNode deletes the file from the HDFS namespace. The deletion of
a file causes the blocks associated with the file to be freed. Note that there
could be an appreciable time delay between the time a file is deleted by a user
and the time of the corresponding increase in free space in HDFS.
A user can
Undelete a file after deleting it as long as it remains in the /trash
directory. If a user wants to undelete a file that he/she has deleted, he/she
can navigate the /trash directory and retrieve the file. The /trash directory
contains only the latest copy of the file that was deleted. The /trash
directory is just like any other directory with one special feature: HDFS applies
specified policies to automatically delete files from this directory. Current
default trash interval is set to 0 (Deletes file without storing in trash).
This value is configurable parameter stored as fs.trash.interval stored in
core-site.xml.
Usage:
hadoop fs [generic options] -expunge
No comments:
Post a Comment