Saturday 2 January 2016

Hadoop: HDFS: Java querying file system

FileSystem class provides method "getFileStatus" to get the information of a file/directory in it.

FileStatus getFileStatus(Path f) throws IOException
Return a file status object that represents the path. By using FileStatus object, you can get block size, replications, symbolic links, permissions, owner, group associated with this file etc.,

Following is the step-by-step procedure to get information about a file in HDFS.

Step 1: Set JAVA_HOME (If it is not set already)

Step 2: Set HADOOP_CLASSPATH like following
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar


Step 3: Following is the java application that gets information about a file in HDFS.

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class FileInformation {
 private static final String uri = "hdfs://localhost/user/harikrishna_gurram/dummy.txt";
 private static final Configuration config = new Configuration();

 public static void printFileInfo() throws IOException {
  /* Get FileSystem object for given uri */
  FileSystem fs = FileSystem.get(URI.create(uri), config);

  /* Get FileStatus object */
  FileStatus status = fs.getFileStatus(new Path(uri));

  System.out.println("Access Time : " + status.getAccessTime());
  System.out.println("Block size : " + status.getBlockSize());
  System.out.println("Group : " + status.getGroup());
  System.out.println("Length : " + status.getLen());
  System.out.println("Modified Time : " + status.getModificationTime());
  System.out.println("Owner : " + status.getOwner());
  System.out.println("Path : " + status.getPath());
  System.out.println("Permission : " + status.getPermission());
  System.out.println("Replication factor : " + status.getReplication());
  System.out.println("Is Directory : " + status.isDirectory());

 }

 public static void main(String args[]) throws IOException {
  printFileInfo();
 }
}

String uri = " hdfs://localhost/user/harikrishna_gurram/dummy.txt";

“uri” is used to locate file location in HDFS. Host details for above uri is configured in “hadoop-2.6.0/etc/hadoop/core-site.xml” file.

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost/</value>
                <description>NameNode URI</description>
        </property>
</configuration>

Please refer to the setup for hadoop here.

Step 4: Compile above java file.
$ hadoop com.sun.tools.javac.Main FileInformation.java

Step 5: Create jar file.
$ jar cf info.jar FileInformation*class

Step 6: Run jar file.    
$ hadoop jar info.jar FileInformation

$ hadoop jar info.jar FileInformation
Access Time : 1434948877838
Block size : 134217728
Group : supergroup
Length : 10
Modified Time : 1434948878401
Owner : harikrishna_gurram
Path : hdfs://localhost/user/harikrishna_gurram/dummy.txt
Permission : rw-r--r--
Replication factor : 3

Is Directory : false



Previous                                                 Next                                                 Home

No comments:

Post a Comment