Tuesday 15 September 2015

Mahout: CachingItemSimilarity : Compute item similarity

It is built on top of other ItemSimilarity implementations, used to cache the results of computations. If you want to clear cache for given item, you can do this by using "clearCacheForItem(long itemID)" method.

Let’s say I had following input data.


customer.csv
1,4,3
1,7,2
1,8,2
1,10,1
2,3,2
2,4,3
2,6,3
2,7,1
2,9,1
3,0,3
3,3,2
3,4,1
3,8,3
3,9,1
4,2,5
4,3,4
4,7,3
4,9,2
5,4,5
5,6,4
5,7,1
5,8,3


1,4,3 means customer 1 like item 4 and rated it 3.
import java.io.File;
import java.io.IOException;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.similarity.CachingItemSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;

public class CachingItemSimilarityEx {
 public static String dataFile = "/Users/harikrishna_gurram/customer.csv";

 public static void main(String args[]) throws IOException, TasteException {

  DataModel model = new FileDataModel(new File(dataFile));

  TanimotoCoefficientSimilarity similarity = new TanimotoCoefficientSimilarity(
    model);
  CachingItemSimilarity cacheSimilarity = new CachingItemSimilarity(similarity, 100);

  long itemIds[] = { 3, 4, 6, 7, 8, 9, 10 };

  double distance[] = cacheSimilarity.itemSimilarities(4, itemIds);

  for (int i = 0; i < itemIds.length; i++) {
   System.out.println("distance between item 4 and " + itemIds[i]
     + " is " + distance[i]);
  }

 }
}


Output

distance between item 4 and 3 is 0.4
distance between item 4 and 4 is 1.0
distance between item 4 and 6 is 0.5
distance between item 4 and 7 is 0.6
distance between item 4 and 8 is 0.75
distance between item 4 and 9 is 0.4
distance between item 4 and 10 is 0.25



Prevoius                                                 Next                                                 Home

No comments:

Post a Comment