It is an
implementation of Pearson correlation. Pearson correlation is most common
measure of correlation in statistics. For users X and Y, the following values
are calculated:
sumX2: sum of the square of all X's
preference values
sumY2: sum of the square of all Y's
preference values
sumXY: sum of the product of X and Y's
preference value for all items for which both X and Y express a preference
The
correlation is then:
sumXY /
sqrt(sumX2 * sumY2)
Result of
Pearson correlation range from -1 to +1.
a.
-1
indicates a perfect negative linear relationship between variables,
b.
0
indicates no linear relationship between variables, and
c.
1
indicates a perfect positive linear relationship between variables.
Following
article explains about Pearson correlation.
Let’s say I
had following input data.
customer.csv
1,4,3 1,7,2 1,8,2 1,10,1 2,3,2 2,4,3 2,6,3 2,7,1 2,9,1 3,0,3 3,3,2 3,4,1 3,8,3 3,9,1 4,2,5 4,3,4 4,7,3 4,9,2 5,4,5 5,6,4 5,7,1 5,8,3
1,4,3 means
customer 1 like item 4 and rated it 3
import java.io.File; import java.io.IOException; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity; import org.apache.mahout.cf.taste.model.DataModel; public class PearsonCorrelationSimilarityEx { public static String dataFile = "/Users/harikrishna_gurram/customer.csv"; public static void main(String args[]) throws IOException, TasteException { DataModel model = new FileDataModel(new File(dataFile)); PearsonCorrelationSimilarity similarity = new PearsonCorrelationSimilarity( model); System.out.println("Similarity between user1 and user2 is " + similarity.userSimilarity(1, 2)); } }
Output
Similarity between user1 and user2 is 0.9999999999999998
Referred Articles
No comments:
Post a Comment