Sometimes
you need to generate recommendations for input data, which has no preference
values. What I mean is, data should be like following. Even if you provide
preference values these are simply ignored.
User_id1 item_id1
User_id2 item_id2
User_id3 item_id3
User_id4 item_id4
Above kind
of data is called Boolean preference data, since it has no preference value. To
handle such kind of data, we need to select proper similarity algorithms and
recommenders.
Choosing similarity algorithm
For this
example, i am going to use TanimotoCoefficientSimilarity, is intended for
"binary" data sets (preference value doesn't matter here).
Choose Recommender
I am going
to use GenericBooleanPrefUserBasedRecommender here. (You can use GenericBooleanPrefItemBasedRecommender,
for item based recommendations).
Let’s say I
had following input data.
Book id
|
Title
|
1
|
Meet Big
Brother
|
2
|
Explore
the Universe
|
3
|
Memoir as
metafiction
|
4
|
A
child-soldier's story
|
5
|
Wicked
good fun
|
6
|
The 60s
kids classic
|
7
|
A
short-form master
|
8
|
Go down
the rabbit hole
|
9
|
Unseated a
president
|
10
|
An
Irish-American Memoir
|
User id
|
Name
|
1
|
Hari
Krishna Gurram
|
2
|
Gopi Battu
|
3
|
Rama
Krishna Gurram
|
4
|
Sudheer
Ganji
|
5
|
Kiran
Darsi
|
6
|
Joel
Chelli
|
7
|
Sankalp
Dubey
|
8
|
Sunil
Kumar
|
9
|
Janaki
Sriram
|
10
|
Phalgun
Garimella
|
11
|
Reshmi
George
|
12
|
Sailaja
Navakotla
|
13
|
Aravind
Phaneendra
|
14
|
Keerthi
Shetty
|
15
|
Sujatha
|
16
|
Vadiraj
Kulakarni
|
17
|
Arpan
|
18
|
Suprabath
Bisoi
|
19
|
Sravani
|
20
|
Gireesh
Amara
|
Following
csv file contains customers purchages and their ratings on books.
customer.csv
1,1,3 1,2,1 1,4,5 1,5,3 1,9,3 1,10,2 2,1,2 2,3,2 2,4,1 2,7,5 3,1,5 3,2,1 3,3,1 3,6,1 3,8,1 4,1,1 4,2,1 4,6,3 4,7,1 4,9,2 5,2,1 5,3,3 5,6,5 5,10,3 6,1,1 6,2,4 6,3,4 6,7,2 6,8,3 7,1,3 7,2,3 7,3,1 7,5,3 7,6,3 7,7,3 8,1,1 8,3,3 8,4,5 8,8,1 8,9,2 9,4,2 9,6,5 9,8,3 9,9,3 10,2,5 10,3,1 10,4,2 10,5,1 10,9,4 11,2,3 11,4,2 11,5,2 11,8,1 12,1,1 12,3,4 12,7,3 12,8,2 13,1,3 13,2,4 13,3,2 13,5,3 13,9,3 14,2,3 14,3,2 14,5,1 14,7,1 14,8,5 14,9,2 15,1,3 15,2,2 15,3,2 15,6,5 15,7,1 15,9,3 16,2,2 16,3,4 16,6,1 16,7,3 16,10,1 17,3,1 17,4,3 17,7,4 17,8,4 18,3,3 18,5,2 18,6,3 18,9,1 18,10,2 19,1,1 19,2,5 19,6,2 19,7,2 19,8,3 19,10,3 20,1,2 20,2,2 20,3,1 20,4,4 20,8,1
20,8,1 means
User20 liked item8 and given rating 1.
Following
application finds recommendations, similar users for customer 1.
import java.io.File; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.cf.taste.recommender.UserBasedRecommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; public class GenericBooleanPrefUserBasedRecommenderEx { private static String input = "/Users/harikrishna_gurram/customer.csv"; private static final int NEIGHBORHOOD_SIZE = 5; private static DataModel model = null; private static UserSimilarity similarity = null; private static UserNeighborhood neighborhood = null; private static UserBasedRecommender recommender = null; private static String[] books = { "Meet Big Brother", "Explore the Universe", "Memoir as metafiction", "A child-soldier's story", "Wicked good fun", "The 60s kids classic", "A short-form master", "Go down the rabbit hole", "Unseated a president", "An Irish-American Memoir" }; private static String[] userNames = { "Hari Krishna Gurram", "Gopi Battu", "Rama Krishna Gurram", "Sudheer Ganji", "Kiran Darsi", "Joel Chelli", "Sankalp Dubey", "Sunil Kumar", "Janaki Sriram", "Phalgun Garimella", "Reshmi george", "Sailaja Navakotla", "Aravind Phaneendra", "Keerthi Shetty", "Sujatha", "Vadiraj Kulakarni", "Arpan", "Suprabath Bisoi", "Sravani", "Gireesh Amara" }; public static void main(String args[]) throws IOException, TasteException { model = new FileDataModel(new File(input)); similarity = new TanimotoCoefficientSimilarity(model); neighborhood = new NearestNUserNeighborhood(NEIGHBORHOOD_SIZE, similarity, model); recommender = new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 5); System.out.println("Recommendations for customer " + userNames[0] + " are:"); System.out.println("*************************************************"); System.out.println("BookId\title\t\testimated preference"); for (RecommendedItem recommendation : recommendations) { int bookId = (int) recommendation.getItemID(); float estimatedPref = recommender.estimatePreference(1, bookId); System.out.println(bookId + " " + books[bookId - 1] + "\t" + estimatedPref); } System.out.println("*************************************************"); long[] userIds = recommender.mostSimilarUserIDs(1, 5); System.out.println("Most similar users for " + userNames[0] + " are"); for (long id : userIds) { System.out.println(id + " " + userNames[(int) id - 1]); } } }
Output
Recommendations for customer Hari Krishna Gurram are: ************************************************* BookId itle estimated preference 3 Memoir as metafiction 1.5178572 8 Go down the rabbit hole 0.80357146 6 The 60s kids classic 0.375 7 A short-form master 0.375 ************************************************* Most similar users for Hari Krishna Gurram are 10 Phalgun Garimella 13 Aravind Phaneendra 11 Reshmi george 4 Sudheer Ganji 8 Sunil Kumar
No comments:
Post a Comment