Saturday, 2 January 2016

Hadoop: HDFS: Java program to write data

In this post, I am going to explain how to write data to HDFS using Java API. org.apache.hadoop.fs.FileSystem class is used to access and manage files/directories in HDFS. "FileSystem" is an abstarct class, Hadoop provides various implementations for this class.

Following is the step-by-step procedure to read data from HDFS.

Step 1: Set JAVA_HOME (If it is not set already)

Step 2: Set HADOOP_CLASSPATH like following
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar


Step 3: Following is the java application that writes data to HDFS.
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;

public class WriteToHDFS {
 private static final String uri = "hdfs://localhost/user/harikrishna_gurram/dummy.txt";
 private static final Configuration config = new Configuration();
 private static final String localFile = "/Users/harikrishna_gurram/file1.txt";

 /* Copies file from local file system to hdfs */
 public static void writeToHDFS() throws IOException {
  InputStream in = new BufferedInputStream(new FileInputStream(localFile));

  /* Get FileSystem object for given uri */
  FileSystem fs = FileSystem.get(URI.create(uri), config);

  OutputStream out = fs.create(new Path(uri), new Progressable() {
   public void progress() {
    System.out.print(".");
   }
  });

  IOUtils.copyBytes(in, out, 4096, true);
 }

 public static void main(String args[]) throws IOException {
  writeToHDFS();
 }
}

String uri = " hdfs://localhost/user/harikrishna_gurram/dummy.txt";

“uri” is used to locate file location in HDFS. Host details for above uri is configured in “hadoop-2.6.0/etc/hadoop/core-site.xml” file.

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost/</value>
                <description>NameNode URI</description>
        </property>
</configuration>

Please refer to the setup for hadoop here.

Step 4: Compile above java file.
$ hadoop com.sun.tools.javac.Main WriteToHDFS.java

Step 5: Create jar file.
$ jar cf write.jar WriteToHDFS*class

Step 6: Run jar file.                              
$ hadoop jar write.jar WriteToHDFS

$ hadoop jar write.jar WriteToHDFS
..
$ hadoop fs -ls /user/harikrishna_gurram/dummy.txt

-rw-r--r--   3 harikrishna_gurram supergroup         10 2015-06-22 10:24 /user/harikrishna_gurram/dummy.txt


Previous                                                 Next                                                 Home

No comments:

Post a Comment