It computes
similarity between users based on the Euclidean "distance" between
two users X and Y. Once distance is computed, to map it inbetween (0, 1], the
similarity could be computed as 1 / (1 + distance).
If p = (p1,
p2,..., pn) and q = (q1, q2,..., qn) are two points, then Euclidian distance
between p and q computed like following.
if p = (1,
2) and q = (3, 4) then distance is 2.8284271247461903
Let’s say I
had following input data.
customer.csv
1,4,3 1,7,2 1,8,2 1,10,1 2,3,2 2,4,3 2,6,3 2,7,1 2,9,1 3,0,3 3,3,2 3,4,1 3,8,3 3,9,1 4,2,5 4,3,4 4,7,3 4,9,2 5,4,5 5,6,4 5,7,1 5,8,3
1,4,3 means
customer 1 like item 4 and rated it 3.
import java.io.File; import java.io.IOException; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity; import org.apache.mahout.cf.taste.model.DataModel; public class EuclideanDistanceSimilarityEx { public static String dataFile = "/Users/harikrishna_gurram/customer.csv"; public static void main(String args[]) throws IOException, TasteException { DataModel model = new FileDataModel(new File(dataFile)); EuclideanDistanceSimilarity similarity = new EuclideanDistanceSimilarity( model); long itemIds[] = { 3, 4, 6, 7, 8, 9, 10 }; double distance[] = similarity.itemSimilarities(2, itemIds); for (int i = 0; i < itemIds.length; i++) { System.out.println("distance between item 2 and " + itemIds[i] + " is " + distance[i]); } } }
Output
distance between item 2 and 3 is 0.5 distance between item 2 and 4 is NaN distance between item 2 and 6 is NaN distance between item 2 and 7 is 0.3333333333333333 distance between item 2 and 8 is NaN distance between item 2 and 9 is 0.25 distance between item 2 and 10 is NaN If similarity is unknown, EuclideanDistanceSimilarity returns Double.NaN.
No comments:
Post a Comment