In this
post, I am going to explain how to read data from HDFS using Java API. org.apache.hadoop.fs.FileSystem
class is used to access and manage files/directories in HDFS.
"FileSystem" is an abstarct class, Hadoop provides various
implementations for this class.
Following is
the step-by-step procedure to read data from HDFS.
Step 1: Set JAVA_HOME (If it is not set already)
Step 2: Set HADOOP_CLASSPATH like following
export
HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
Step 3: Following is the java application that reads data
from HDFS.
import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; public class ReadFromHDFS { private static final String uri = "hdfs://localhost/user/harikrishna_gurram/dir1/sample.txt"; private static final Configuration config = new Configuration(); public static void printToConsole() throws IOException { /* Get the FileSystem for hdfs uri */ FileSystem fs = FileSystem.get(URI.create(uri), config); FSDataInputStream in = null; try { /* Opens FSDataInputStream for given path */ in = fs.open(new Path(uri)); /* Copies data from FSDataInputStream to console */ IOUtils.copyBytes(in, System.out, 4096, false); } finally { IOUtils.closeStream(in); } } public static void main(String args[]) throws IOException { printToConsole(); } }
String uri =
"hdfs://localhost/user/harikrishna_gurram/dir1/sample.txt";
“uri” is
used to locate file location in HDFS. Host details for above uri is configured
in “hadoop-2.6.0/etc/hadoop/core-site.xml” file.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
<description>NameNode
URI</description>
</property>
</configuration>
Please refer
to the setup of hadoop here.
FSDataInputStream in = fs.open(new Path(uri));
Above
statement gets the FSDataInputStream object for given uri.
IOUtils.copyBytes(in, System.out, 4096, false);
Above
statement copies data from FSDataInputStream object to console.
Step 4: Compile above java file.
$ hadoop
com.sun.tools.javac.Main ReadFromHDFS.java
Step 5: Create jar file.
$ jar cf
read.jar ReadFromHDFS*class
Step 6: Run jar file.
$ hadoop jar read.jar ReadFromHDFS
No comments:
Post a Comment