Saturday, 2 January 2016

Hadoop: HDFS: Java listing files

By using listStatus method of FileSystem class, you can get the contents of a directory.

FileStatus[] listStatus(Path f) throws FileNotFoundException, IOException;
List the statuses of the files/directories in the given path if the path is a directory.

Following is the step-by-step procedure to list statuses of file in given directory.

Step 1: Set JAVA_HOME (If it is not set already)

Step 2: Set HADOOP_CLASSPATH like following
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar


Step 3: Following is the java application that list all files from “/user/harikrishna_gurram”.
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class ListFiles {
 private static final String uri = "hdfs://localhost/user/harikrishna_gurram";
 private static final Configuration config = new Configuration();

 public static void listFiles() throws IOException {
  /* Get FileSystem object for given uri */
  FileSystem fs = FileSystem.get(URI.create(uri), config);

  FileStatus statuses[] = fs.listStatus(new Path(uri));

  for (FileStatus status : statuses) {
   System.out.println("*************************************");

   System.out.println("Access Time : " + status.getAccessTime());
   System.out.println("Block size : " + status.getBlockSize());
   System.out.println("Group : " + status.getGroup());
   System.out.println("Length : " + status.getLen());
   System.out.println("Modified Time : "
     + status.getModificationTime());
   System.out.println("Owner : " + status.getOwner());
   System.out.println("Path : " + status.getPath());
   System.out.println("Permission : " + status.getPermission());
   System.out.println("Replication factor : "
     + status.getReplication());
   System.out.println("Is Directory : " + status.isDirectory());
  }
 }

 public static void main(String args[]) throws IOException {
  listFiles();
 }
}

String uri = " hdfs://localhost/user/harikrishna_gurram ";

“uri” is used to locate file/directory location in HDFS. Host details for above uri are configured in “hadoop-2.6.0/etc/hadoop/core-site.xml” file.

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost/</value>
                <description>NameNode URI</description>
        </property>
</configuration>

Please refer to the setup for hadoop here.

Step 4: Compile above java file.
$ hadoop com.sun.tools.javac.Main ListFiles.java

Step 5: Create jar file.
$ jar cf list.jar ListFiles*class

Step 6: Run jar file.    

$ hadoop jar list.jar ListFiles



Previous                                                 Next                                                 Home

No comments:

Post a Comment