Thursday 30 April 2015

Apache avro : Serializing and deserializing without code generation

In previous post, I explained, how to serialize and deserialize data using Apache avro. In previous post to perform serialization and deserialization, I generated Employee.java, by using “avro-tools” library. In this post, I am going to explain, how to serialize and deserialize the data without code generation.

It is three-step process.
   1.   Create Employee object from the schema file.
   2.   Serialize the employee object
   3.   Deserialize the employee object.

employee.avsc
{"namespace": "tutorial.model",
 "type": "record",
 "name": "Employee",
 "fields": [
     {"name": "firstName", "type": "string"},
     {"name": "lastName", "type": "string"},
     {"name": "age",  "type": "int"},
     {"name": "id",  "type": "string"},
     {"name" : "company", "type" : "string"}
 ]
}

Following code is used to create an Employee object from schema file.

/* Read schema definition and create schema object */
Schema schema = new Schema.Parser().parse(new File(schemaFileName));

/* Use the schema and create an employee object */
GenericRecord employee = new GenericData.Record(schema);

/* Define employee */
employee.put("firstName", "Hari krishna");
employee.put("lastName", "Gurram");
employee.put("age", 27);
employee.put("id", "E432123");
employee.put("company", "ABCD");

Follow the steps to implement complete Application.
Step 1: Create Eclipse maven project “avro_tutorial”.

File -> New -> Other

Select Maven Project and press Next

Select the check box “Create a simple project (Skip archetype selection) and press Next.

Give Group Id and Artifact Id as “avro_tutorial” and press Finish.

Step 2: Open “pom.xml” and update dependencies for avro.

pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>avro_tutorial</groupId>
 <artifactId>avro_tutorial</artifactId>
 <version>0.0.1-SNAPSHOT</version>
 <properties>
  <avro_version>1.7.7</avro_version>
 </properties>

 <dependencies>
  <dependency>
   <groupId>org.apache.avro</groupId>
   <artifactId>avro</artifactId>
   <version>${avro_version}</version>
  </dependency>
 </dependencies>
</project>


Step 3: Create “EmployeeUtil.java” under the package “tutorial.main”.
package tutorial.main;

import java.io.File;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;

public class EmployeeUtil {
 public static void serializeEmployee(GenericRecord emp,
   String serializedFileName, String schemaFileName)
   throws IOException {

  /* Read schema definition and create schema object */
  Schema schema = new Schema.Parser().parse(new File(schemaFileName));

  DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(
    schema);
  DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(
    datumWriter);

  dataFileWriter.create(schema, new File(serializedFileName));
  dataFileWriter.append(emp);
  dataFileWriter.close();
 }

 public static GenericRecord deSerializeEmployee(String serializedFileName,
   String schemaFileName) throws IOException {

  /* Read schema definition and create schema object */
  Schema schema = new Schema.Parser().parse(new File(schemaFileName));

  // De-serialize employee from disk
  File file = new File(serializedFileName);

  // Deserialize users from disk
  DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(
    schema);

  DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(
    file, datumReader);

  GenericRecord employee = null;
  while (dataFileReader.hasNext()) {
   /*
    * Reuse employee object by passing it to next(). This saves us from
    * allocating and garbage collecting many objects for files with
    * many items.
    */
   employee = dataFileReader.next(employee);
  }

  dataFileReader.close();
  return employee;
 }

 public static GenericRecord createEmployee(String schemaFileName)
   throws IOException {

  /* Read schema definition and create schema object */
  Schema schema = new Schema.Parser().parse(new File(schemaFileName));

  /* Use the schema and create an employee object */
  GenericRecord employee = new GenericData.Record(schema);

  /* Define employee */
  employee.put("firstName", "Hari krishna");
  employee.put("lastName", "Gurram");
  employee.put("age", 27);
  employee.put("id", "E432123");
  employee.put("company", "ABCD");

  return employee;
 }

}


Step 4: Create “Main.java”.

package tutorial.main;

import java.io.IOException;

import org.apache.avro.generic.GenericRecord;

public class Main {
 public static void main(String args[]) throws IOException {
  GenericRecord employee = EmployeeUtil.createEmployee("employee.avsc");

  EmployeeUtil.serializeEmployee(employee, "ser.out", "employee.avsc");

  GenericRecord employee1 = EmployeeUtil.deSerializeEmployee("ser.out", "employee.avsc");
  
  System.out.println(employee1);
 }
}

Run “Main.java”, you will get output like below.

{"firstName": "Hari krishna", "lastName": "Gurram", "age": 27, "id": "E432123", "company": "ABCD"}

Total project structure looks like below.
Prevoius                                                 Next                                                 Home

No comments:

Post a Comment