Thursday, 15 March 2018

How to parse ATOM/RSS feed using Java?

In this post, you are going to learn below things.
a.   What is ATOM publishing protocol?
b.   ATOM vs RSS
c.   Different libraries that support ATOM feed.
d.   Example of atom feed
e.   Parse ATOM feed using ROME library

What is Atom Publishing Protocol?
It is application-level protocol for publishing and editing Web resources. The Atom Publishing Protocol is documented in rfc5023. The schema/format of Atom is documented in rfc4287.

ATOM Vs RSS
Atom are the two main standards of web syndication. ATOM protocol was developed to overcome the flaws in RSS feed format. A RSS file   can have extension of .rss or .xml, where as an atom file can have extensions of .atom or .xml

Different libraries that support ATOM/RSS feed
Below libraries can be used to parse ATOM feed.
b.   ROME

If you would like to know, detailed information about Abdera library, I will suggest you, go through my post.

Example of ATOM feed
A typical example of ATOM feed looks like below.

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>tag:blogger.com,1999:blog-3062500619105519975</id>
  <updated>2016-12-17T20:29:02.425Z</updated>
  <title type="text">java tutorial : Blog to learn java programming</title>
  <subtitle type="text">Learners Blog</subtitle>
  <category term="Programming" scheme="scheme" label="Java Tutorial for beginners"/>
  <contributor>
    <name>Krishna</name>
    <email>Krishna@Krishna.com</email>
    <uri>https://self-learning-java-tutorial.blogspot.com</uri>
  </contributor>
  <generator uri="http://www.blogger.com" version="7.00">Blogger</generator>
  <icon>https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJHpypki-Y-JqQUorVvFkZkU0JWZKxkb138p8W-saAaVr4qaPKTgpkE7BV8sqq0elgI_pDejsHMxTEmY2KPLgwrkfyQXwWqkXYWRUJIKj_ci1JDFzVpKg1lddacb8SifWynpsWfmR_RH4j/s1600/1.bmp</icon>
  <link href="https://self-learning-java-tutorial.blogspot.com/feeds/posts/default" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" title="" hreflang="" length="0"/>
  <link href="http://www.blogger.com/feeds/3062500619105519975/posts/default?alt=atom" rel="self" type="application/atom+xml" title="" hreflang="" length="0"/>
  <author>
    <name>hari krishna</name>
    <email>noreply@blogger.com</email>
    <uri>https://self-learning-java-tutorial.blogspot.com</uri>
  </author>
  <logo>https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJHpypki-Y-JqQUorVvFkZkU0JWZKxkb138p8W-saAaVr4qaPKTgpkE7BV8sqq0elgI_pDejsHMxTEmY2KPLgwrkfyQXwWqkXYWRUJIKj_ci1JDFzVpKg1lddacb8SifWynpsWfmR_RH4j/s1600/1.bmp</logo>
  <rights type="text">Copyrighted : Java Tutorial</rights>
  <entry>
    <id>tag:blogger.com,1999:blog-3062500619105519975.post-918659526208416960</id>
    <published>2016-12-17T20:29:02.445Z</published>
    <updated>2016-12-17T20:29:02.445Z</updated>
    <title type="text">SWT: Slider Tutorial</title>
    <content type="html">Slider class is used to define Slider widget</content>
    <author>
      <name>Hari Krishna Gurram</name>
      <email>noreply@blogger.com</email>
      <uri>https://self-learning-java-tutorial.blogspot.com</uri>
    </author>
    <author>
      <name>Rama Krishna Gurram</name>
      <email>noreply@blogger.com</email>
      <uri>https://self-learning-java-tutorial.blogspot.com</uri>
    </author>
    <contributor>
      <name>Ritweek Mehenty</name>
      <email>noreply@blogger.com</email>
      <uri>https://self-learning-java-tutorial.blogspot.com</uri>
    </contributor>
    <contributor>
      <name>Sailaja Navakotla</name>
      <email>noreply@blogger.com</email>
      <uri>https://self-learning-java-tutorial.blogspot.com</uri>
    </contributor>
    <category term="GUI" scheme="scheme" label="SWT for beginners"/>
    <rights type="text">Copyrighted : SWT Tutorial</rights>
    <summary type="text">SWT For beginners</summary>
  </entry>
</feed>

Parse ATOM feed using ROME library
I am going to use below maven dependency to parse an atom feed.

<!-- https://mvnrepository.com/artifact/com.rometools/rome -->
<dependency>
    <groupId>com.rometools</groupId>
    <artifactId>rome</artifactId>
    <version>1.9.0</version>
</dependency>

ROME is a set of Atom/RSS Java utilities that make it easy to work in Java with most syndication formats.

At the time of writing this article, ROME supports below RSS and ATOM formats.

RSS 0.90,
RSS 0.91,
RSS 0.92,
RSS 0.93,
RSS 0.94,
RSS1.0,
RSS 2.0,
ATOM 0.3 feed and
ATOM 1.0 feed

Below step-by-step procedure explains how to parse and get the entries from atom feed.

Step 1: Create an URL instance from the atom feed.

String atomFeedURL = "https://self-learning-java-tutorial.blogspot.com/atom.xml";
URL feedUrl = new URL(atomFeedURL);

Step 2: Get Syndicate feed from the feed url

SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

Step 3: Get all the entries from syndicate feed, print the author names and title.

List<SyndEntry> entries = feed.getEntries();

for (SyndEntry syndEntry : entries) {
         String author = syndEntry.getAuthor();
         String title = syndEntry.getTitle();
         System.out.println("author : " + author);
         System.out.println("Title : " + title);
}

Find the below working application.


AtomParserDemo.java
package com.sample.util;

import java.net.URL;
import java.util.List;

import com.rometools.rome.feed.synd.SyndEntry;
import com.rometools.rome.feed.synd.SyndFeed;
import com.rometools.rome.io.SyndFeedInput;
import com.rometools.rome.io.XmlReader;

public class AtomParserDemo {

 private static String atomFeedURL = "https://self-learning-java-tutorial.blogspot.com/atom.xml";

 public static void main(String args[]) throws Exception {
  URL feedUrl = new URL(atomFeedURL);

  SyndFeedInput input = new SyndFeedInput();
  SyndFeed feed = input.build(new XmlReader(feedUrl));

  List<SyndEntry> entries = feed.getEntries();

  for (SyndEntry syndEntry : entries) {
   String author = syndEntry.getAuthor();
   String title = syndEntry.getTitle();
   System.out.println("author : " + author);
   System.out.println("Title : " + title);
  }

 }
}


Sample Output
author : hari krishna
Title : How to chain multiple different InputStreams into one InputStream
author : hari krishna
Title : hamcrest: hasItemInArray: Check the existence of item in the array
author : hari krishna
Title : Base64 encoding and decoding in Java
author : hari krishna
Title : Hamcrest: array: Match every element of array with specific matcher
author : hari krishna
Title : How to write multiple input streams to a file
author : hari krishna
Title : Hamcrest: Matchers to work with arrays
author : hari krishna
Title : Hamcrest: stringContainsInOrder: Matches a string against all the substrings
author : hari krishna
Title : HamCrest: isEmptyOrNullString: Match if the string is empty or null
author : hari krishna
Title : Hamcrest: isEmptyString: Match if the string is empty
author : hari krishna
Title : Hamcrest: equalToIgnoringWhiteSpace: check equality of strings by ignoring case and white spaces
author : hari krishna
Title : Hamcrest: equalToIgnoringCase: check equality of strings by ignoring case
author : hari krishna
Title : Hamcrest: endsWith: Check whether string ends with given string
author : hari krishna
Title : Hamcrest: startsWith: Check whether string starts with given string
author : hari krishna
Title : Hamcrest: containsString: match sub string
author : hari krishna
Title : Hamcrest: Matchers to work with strings
author : hari krishna
Title : Hamcrest: notNullValue : Creates a matcher that matches if examined object is not null
author : hari krishna
Title : Hamcrest: nullValue: Creates a matcher that matches if examined object is null
author : hari krishna
Title : Hamcrest: Match null values
author : hari krishna
Title : Hamcrest: Hello World Application
author : hari krishna
Title : Introduction to Hamcrest
author : hari krishna
Title : Hamcrest tutorial
author : hari krishna
Title : Kotlin: complementary variance annotation: in
author : hari krishna
Title : Kotlin: Generics
author : hari krishna
Title : Kotlin: Data classes:  Destructuring declarations




No comments:

Post a Comment