Programming for beginners: Mahout: PearsonCorrelationSimilarity : Compute User similarity

It is an implementation of Pearson correlation. Pearson correlation is most common measure of correlation in statistics. For users X and Y, the following values are calculated:

sumX2: sum of the square of all X's preference values

sumY2: sum of the square of all Y's preference values

sumXY: sum of the product of X and Y's preference value for all items for which both X and Y express a preference

The correlation is then:

sumXY / sqrt(sumX2 * sumY2)

Result of Pearson correlation range from -1 to +1.

a. -1 indicates a perfect negative linear relationship between variables,

b. 0 indicates no linear relationship between variables, and

c. 1 indicates a perfect positive linear relationship between variables.

Following article explains about Pearson correlation.

http://www.socialresearchmethods.net/kb/statcorr.php

Let’s say I had following input data.

customer.csv

1,4,3
1,7,2
1,8,2
1,10,1
2,3,2
2,4,3
2,6,3
2,7,1
2,9,1
3,0,3
3,3,2
3,4,1
3,8,3
3,9,1
4,2,5
4,3,4
4,7,3
4,9,2
5,4,5
5,6,4
5,7,1
5,8,3

1,4,3 means customer 1 like item 4 and rated it 3

Following application computes PearsonCorrelationSimilarity between two users.

import java.io.File;
import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;

public class PearsonCorrelationSimilarityEx {
 public static String dataFile = "/Users/harikrishna_gurram/customer.csv";

 public static void main(String args[]) throws IOException, TasteException {

  DataModel model = new FileDataModel(new File(dataFile));

  PearsonCorrelationSimilarity similarity = new PearsonCorrelationSimilarity(
    model);

  System.out.println("Similarity between user1 and user2 is "
    + similarity.userSimilarity(1, 2));

 }
}

Output

Similarity between user1 and user2 is 0.9999999999999998

Referred Articles

http://www.statisticshowto.com/what-is-the-pearson-correlation-coefficient/
http://onlinestatbook.com/2/describing_bivariate_data/pearson.html

Prevoius Next Home

Programming for beginners

Tuesday, 8 September 2015

Mahout: PearsonCorrelationSimilarity : Compute User similarity

No comments:

Post a Comment