Saturday 2 January 2016

Hadoop: HDFS: Java program to read data

In this post, I am going to explain how to read data from HDFS using Java API. org.apache.hadoop.fs.FileSystem class is used to access and manage files/directories in HDFS. "FileSystem" is an abstarct class, Hadoop provides various implementations for this class.

Following is the step-by-step procedure to read data from HDFS.

Step 1: Set JAVA_HOME (If it is not set already)

Step 2: Set HADOOP_CLASSPATH like following
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar


Step 3: Following is the java application that reads data from HDFS.
import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class ReadFromHDFS {

 private static final String uri = "hdfs://localhost/user/harikrishna_gurram/dir1/sample.txt";
 private static final Configuration config = new Configuration();

 public static void printToConsole() throws IOException {
  /* Get the FileSystem for hdfs uri */
  FileSystem fs = FileSystem.get(URI.create(uri), config);

  FSDataInputStream in = null;
  try {
   /* Opens FSDataInputStream for given path */
   in = fs.open(new Path(uri));

   /* Copies data from FSDataInputStream to console */
   IOUtils.copyBytes(in, System.out, 4096, false);
  } finally {
   IOUtils.closeStream(in);
  }
 }

 public static void main(String args[]) throws IOException {
  printToConsole();
 }
}

String uri = "hdfs://localhost/user/harikrishna_gurram/dir1/sample.txt";

“uri” is used to locate file location in HDFS. Host details for above uri is configured in “hadoop-2.6.0/etc/hadoop/core-site.xml” file.

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost/</value>
                <description>NameNode URI</description>
        </property>
</configuration>

Please refer to the setup of hadoop here.

FSDataInputStream in = fs.open(new Path(uri));
Above statement gets the FSDataInputStream object for given uri.

IOUtils.copyBytes(in, System.out, 4096, false);

Above statement copies data from FSDataInputStream object to console.

Step 4: Compile above java file.
$ hadoop com.sun.tools.javac.Main ReadFromHDFS.java

Step 5: Create jar file.
$ jar cf read.jar ReadFromHDFS*class

Step 6: Run jar file.
$ hadoop jar read.jar ReadFromHDFS



Previous                                                 Next                                                 Home

No comments:

Post a Comment