In supervised learning, a model is trained using a labelled data. That means each data point in the training data set has a known output value.
For example, I want to build a machine learning model that predict the prices of a second hand mobile phone. Following independent variables can be used to predict the price of the mobile.
- brand: (Google, Apple, OnePlus, Samsung etc,)
- model: iphone 11, OnePlus7 etc.,
- Release year
- Ram of the phone
- storage capacity
- Front camera quality
- Back camera quality
- warranty status
- price of new version of the phone in market
- demand of the phone in market
- screen size
- Age of the phone
- Number of previous owners
- repair history
- features like 5G
- operating system : Android, IOS
- battery capacity
- damages if any
- current price in the market etc.,
Above independent variables are used to come up with a formula to predict the dependent variable ‘price’ of the phone. Same is depicted in the below image.
Supervised learning mainly used in following areas.
a. Regression: Predicting a continuous value based in input features. In other words, predicting a dependent variable based on one or more independent variables. Price prediction of second hand mobile phone is an example of Regression. . Following are some of the examples of regression.
1. House price prediction
2. Stock price prediction
3. Weather forecasting
4. Demand forecasting
5. Website traffic prediction
6. Predicting sales of a company
7. Predicting success rate of cancer treatment
b. Classification: Predict the category of input data. Following are some of the examples of classification.
1. Image classification: Classifying the images to different categories like cat, dog, tiger, human, hill, lake etc.,
2. Email spam detection
3. Language detection by the text
4. Disease category detection (diabetes, kidney disease etc.,)
5. Text content categorization like Business, politics, stock market, Entertainment etc.,
6. Customers classification by the past purchase history
Key terms in Supervised learning
- Independent variables: Input data points used to predict the outcome
- Dependent variable : It is the one that we are trying to predict from given independent variables. For example, price of second hand mobile.
- Training data: Labelled dataset used to train the machine learning model.
- Test data: Separate labelled dataset used to evaluate the model's performance.
- Overfitting: Model performs very well on the training data, but not work well on the new data.
- Underfitting: model is unable to capture the underlying patterns in the training data and also performs poorly on new data.
- Loss function: A mathematical equation or function that quantifies the difference between the predicted values and the actual labels in the training data. The aim of the model is to reduce the loss during training.
- Hyperparameters: Parameters that are set before training. For example learning rate, number of hidden layers in the neural network, batch size, number of times the model will be trained on the training data etc.,
No comments:
Post a Comment