Wednesday, 14 March 2018

Create XML sitemaps in Java

In this post, you are going to learn below things.
a.   What is sitemap
b.   What is sitemap protocol
c.   Who uses the sitemap
d.   How to generate sitemap using sitemapgen4j library

What is sitemap?
Sitemap represents all the pages of a website. Usually it is an xml file, that lists all the urls of your website. For example, below url represents all the web pages of my blog

What is sitemap protocol?
Google developed sitemap protocol, by using this web developers can post their web pages across sites.

A sample sitemap looks like below.
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
  <loc>https://self-learning-java-tutorial.blogspot.com/2018/03/base64-encoding-and-decoding-in-java.html</loc>
  <lastmod>2018-03-06T14:04:57Z</lastmod>
  <changefreq>daily</changefreq>
        <priority>0.8</priority>
 </url>
</urlset>

A sitemap file can refer other sitemap files.
<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <sitemap>
  <loc>https://self-learning-java-tutorial.blogspot.com/sitemap.xml?page=1</loc>
 </sitemap>
 
 <sitemap>
  <loc>https://self-learning-java-tutorial.blogspot.com/sitemap.xml?page=2</loc>
 </sitemap>
</sitemapindex>

Below table summarizes the important tags of sitemap file.

Tag
Description
<urlset>
All the urls of your web site are specified in this tag
<url>
Information like url location, last modification date are specified in this tag
<sitemapindex>
You can specify other sitemaps location in this tag.
<sitemap>
Specify the details of sitemap
<loc>
Specify the location of url (or) sitemap
<lastmod>
Specify the last modification date of the url. Date should be specified in ISO_8601 format.
<changefreq>
It tells, how frequently the page can change. It can be one of below values.
always
hourly
daily
weekly
monthly
yearly
never
<priority>
Priority of the url relative to other urls in the same web site.

Tags <lastmod>, <changefreq> and <priority> are optional.

Who uses sitemap?
Sitempas are submitted to search engines. Search engines use the sitemaps while indexing the content of your web site.

How to generate sitemap using sitemapgen4j library?
Sitemapgen4j is a java library used to generate xml sitemaps.

I am going to use below maven dependency.

<!-- https://mvnrepository.com/artifact/com.github.dfabulich/sitemapgen4j -->
<dependency>
    <groupId>com.github.dfabulich</groupId>
    <artifactId>sitemapgen4j</artifactId>
    <version>1.0.6</version>
</dependency>

Below step-by-step procedure explains how to generate simple sitemap.

Get an instance of 'WebSitemapGenerator'.
WebSitemapGenerator webSitemapGenerator = WebSitemapGenerator.builder(WEB_PAGE_URL, new File(LOCAL_FOLDER_PATH)).build();

Create a sitemap url using 'WebSitemapUrl' class.
WebSitemapUrl sitemapUrl1 = new WebSitemapUrl.Options(url).lastMod(modifiedDate).priority(priority).changeFreq(changeFrequency).build();

Add the sitemap url.
webSitemapGenerator.addUrl(sitemapUrl1);

Find the below working application.


Sitemaputil.java
package com.sample.util;

import java.io.File;
import java.util.Date;

import com.redfin.sitemapgenerator.ChangeFreq;
import com.redfin.sitemapgenerator.WebSitemapGenerator;
import com.redfin.sitemapgenerator.WebSitemapUrl;

public class SitemapUtil {
 private static final String WEB_PAGE_URL = "https://self-learning-java-tutorial.blogspot.com";
 private static final String LOCAL_FOLDER_PATH = "C:\\Users\\krishna\\Miscellaneous";

 public static void main(String args[]) throws Exception {

  /* get the instance of WebSitemapGenerator */
  WebSitemapGenerator webSitemapGenerator = WebSitemapGenerator.builder(WEB_PAGE_URL, new File(LOCAL_FOLDER_PATH))
    .build();

  String url = "https://self-learning-java-tutorial.blogspot.com/2018/03/base64-encoding-and-decoding-in-java.html";
  double priority = 0.8;
  ChangeFreq changeFrequency = ChangeFreq.YEARLY;
  Date modifiedDate = new Date();

  /* Create an instance of WebsitemapURL */
  WebSitemapUrl sitemapUrl1 = new WebSitemapUrl.Options(url).lastMod(modifiedDate).priority(priority)
    .changeFreq(changeFrequency).build();

  /* Add the urls to webSitemapGenerator */
  webSitemapGenerator.addUrl(sitemapUrl1);
  webSitemapGenerator.addUrl("https://self-learning-java-tutorial.blogspot.com/2018/03/how-to-write-multiple-input-streams-to.html");
 
  webSitemapGenerator.write();
 }
}



2 comments: