Saturday, 12 September 2015

Mahout: TanimotoCoefficientSimilarity: Compute User similarity


TanimotoCoefficientSimilarity is based on Tanimoto coefficient, or extended Jaccard coefficient. Tanimoto coefficient is the ratio of the size of the intersection to the size of the union of their preferred items. Go through following article to know about Tanimoto coefficient.


This is used when user don’t provide preference values.

Let’s say I had following input data.

customer.csv
1,1
1,2
1,3
1,7
1,8
2,1
2,2
2,3
2,4
2,5
2,7
3,1
3,2
3,3
3,5
3,6
3,7
4,1
4,3
4,4
4,5
4,7
4,9
4,10
5,1
5,2
5,3
5,4
5,9


1,2 means customer 1 like item 1.
import java.io.File;
import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;

public class TanimotoCoefficientSimilarityEx {
 public static String dataFile = "/Users/harikrishna_gurram/customer.csv";

 public static void main(String args[]) throws IOException, TasteException {

  DataModel model = new FileDataModel(new File(dataFile));

  TanimotoCoefficientSimilarity similarity = new TanimotoCoefficientSimilarity(
    model);

  System.out.println("Similarity between user1 and user2 is "
    + similarity.userSimilarity(1, 2));

 }
}


Output
Similarity between user1 and user2 is 0.5714285714285714
Prevoius                                                 Next                                                 Home

No comments:

Post a Comment