CityBlockSimilarity
is implementation of City Block distance (Also known as Manhattan distance).
if x=(a,b)
and y=(c,d), the Manhatten distance between x and y is |a−c|+|b−d|.
x = (5, 3)
y = (6, 2)
Manhatten
distance = |5-6| + |3-2| = 1 + 1 = 2.
x1: 1.0, 3.2, 4.8, 0.1, 3.2, 0.6, 2.2, 1.1
x2: 0.1, 5.2, 1.9, 4.2, 1.9, 0.1, 0.1, 6.0
Manhatten
distance(MD) between x1 and x2 is
|1−0.1|+|3.2−5.2|+|4.8−1.9|+|0.1−4.2|+|3.2−1.9|+|0.6−0.1|+|2.2−0.1|+|1.1−6.0|
0.9+2+2.9+4.1+1.3+0.5+2.1+4.9
18.7
Following
example explains how to compute UserSimilarity between two users using CityBlockSimilarity.
Let’s say I
had following input data.
customer.csv
1,4,3 1,7,2 1,8,2 1,10,1 2,3,2 2,4,3 2,6,3 2,7,1 2,9,1 3,0,3 3,3,2 3,4,1 3,8,3 3,9,1 4,2,5 4,3,4 4,7,3 4,9,2 5,4,5 5,6,4 5,7,1 5,8,3
1,4,3 means customer
1 like item 4 and rated it 3
import java.io.File; import java.io.IOException; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.similarity.CityBlockSimilarity; import org.apache.mahout.cf.taste.model.DataModel; public class CityBlockSimilarityEx { public static String dataFile = "/Users/harikrishna_gurram/customer.csv"; public static void main(String args[]) throws IOException, TasteException { DataModel model = new FileDataModel(new File(dataFile)); CityBlockSimilarity similarity = new CityBlockSimilarity(model); System.out.println("Similarity between user1 and user2 is " + similarity.userSimilarity(1, 2)); } }
Output
Similarity between user1 and user2 is 0.16666666666666666
When you go
through the source code, it is implemented like following.
public double userSimilarity(long userID1, long userID2) throws TasteException { DataModel dataModel = getDataModel(); FastIDSet prefs1 = dataModel.getItemIDsFromUser(userID1); FastIDSet prefs2 = dataModel.getItemIDsFromUser(userID2); int prefs1Size = prefs1.size(); int prefs2Size = prefs2.size(); int intersectionSize = prefs1Size < prefs2Size ? prefs2.intersectionSize(prefs1) : prefs1.intersectionSize(prefs2); return doSimilarity(prefs1Size, prefs2Size, intersectionSize); } private static double doSimilarity(int pref1, int pref2, int intersection) { int distance = pref1 + pref2 - 2 * intersection; return 1.0 / (1.0 + distance); }
It computes
like following.
Step 1: finds out all the item ids for given user.
Item ids for user
1 are: 4, 7, 8, 10
Item ids for
user 2 are: 3, 4, 6, 7, 9
Total item
ids for user 1 prefs1Size = 4
Total item
ids for user 2 prefs2Size = 5
Common item
ids for user1 and user2 are (4, 7)
Total common
item ids are intersectionSize =2
Compute
similarity distance using following formula.
distance =
prefs1Size + prefs2Size - 2 * intersectionSize;
= 4 + 5 – (2 * 2)
= 9 – 4
= 5
To map
distance between 0 and 1, following logic is used.
Manhattan_distance
= 1.0 / (1.0 + distance);
= 1.0 / (1.0 + 5)
= 1.0 / 6.0
= 0.16666666666666666
No comments:
Post a Comment