Apache
Mahout is an open source project by Apache Software foundation, to produce free
implementations of distributed or scalable machine learning algorithms focused
primarily in the areas of collaborative filtering, clustering and classification. Mahout
is a Hindi word, refers to an elephant driver.
In this
tutorial series, I am going to explain these three areas (collaborative filtering,
clustering and classification) in brief; later posts explain them in detail.
Collaborative filtering
It is a
technique used by recommender systems. You can observe e-commerce sites; they
show you some recommendations while purchasing.
Most of the
e-commerce sites use Collaborative filtering to recommend products to users,
based on their past behavior. Even news web sites offering related news, based
on users past read articles.
Clustering
Clustering
comes under unsupervised learning category. Clustering is used to uncover
hidden relations in huge data sets. Clustering takes huge data as input and
groups the data into clusters, based on various properties.
For example
Google news, groups news articles using clustering technique.
Classification
Classification
comes under supervised learning category. Here we train system with huge input
samples, and system predicts data using the trained samples. Results depend on
training samples.
For example,
mail systems detects spam messages using this model.
No comments:
Post a Comment