TanimotoCoefficientSimilarity
is based on Tanimoto coefficient, or extended Jaccard coefficient. Go through
following article to know about Tanimoto coefficient.
This is used
when user don’t provide preference values.
Let’s say I
had following input data.
customer.csv
1,1 1,2 1,3 1,7 1,8 2,1 2,2 2,3 2,4 2,5 2,7 3,1 3,2 3,3 3,5 3,6 3,7 4,1 4,3 4,4 4,5 4,7 4,9 4,10 5,1 5,2 5,3 5,4 5,9
1,2 means
customer 1 like item 1.
import java.io.File; import java.io.IOException; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity; import org.apache.mahout.cf.taste.model.DataModel; public class TanimotoCoefficientSimilarityEx { public static String dataFile = "/Users/harikrishna_gurram/customer.csv"; public static void main(String args[]) throws IOException, TasteException { DataModel model = new FileDataModel(new File(dataFile)); TanimotoCoefficientSimilarity similarity = new TanimotoCoefficientSimilarity( model); long itemIds[] = { 3, 4, 6, 7, 8, 9, 10 }; double distance[] = similarity.itemSimilarities(4, itemIds); for (int i = 0; i < itemIds.length; i++) { System.out.println("distance between item 4 and " + itemIds[i] + " is " + distance[i]); } } }
Output
distance between item 4 and 3 is 0.6 distance between item 4 and 4 is 1.0 distance between item 4 and 6 is NaN distance between item 4 and 7 is 0.4 distance between item 4 and 8 is NaN distance between item 4 and 9 is 0.6666666666666666 distance between item 4 and 10 is 0.3333333333333333
TanimotoCoefficientSimilarity returns NaN, if similarity is unknown.
No comments:
Post a Comment