Machine Learning Basics: What Is Supervised Learning?

Written by Coursera Staff • Updated on

Explore the definition of supervised learning, its associated algorithms, its real-world applications, and how it varies from unsupervised learning.

[Featured Image] A businesswoman smiles as they learn about supervised learning and unsupervised learning as part of her workplace’s training on machine learning.

Supervised learning is a category within the machine learning realm defined by its use of models that train with labeled data to make predictions or classify new data. Within the labeled data, features exist as the input, and targets exist as the output. With these inputs and outputs, the model trains to discover the mapping between them to make accurate predictions on additional data sets. 

Various applications widely use supervised learning. Read on to further explore what it is and its many uses in today’s world. 

What is supervised learning?

Supervised learning, a subset of machine learning, involves training models and algorithms to predict characteristics of new, unseen data using labeled data sets. Each output matches an input, meaning a corresponding output label exists for each input feature present in the labeled data. The supervised learning model aims to understand and begin to map the overall relationship between inputs and outputs to craft an algorithm capable of determining an accurate forecast on fresh input data.

Data collection and preprocessing, feature and model selection, model training, model evaluation, and prediction are some of the crucial elements in the supervised learning process. In industries including computer vision and finance, supervised learning has various uses. Examples include sentiment analysis, picture identification, and stock market predictions.

Data mining

Data mining involves using techniques to analyze large data sets to discover trends within the data. In the context of supervised learning, data mining helps identify a predefined target. Various algorithms and techniques exist within data mining that work to sift through large data sets to provide meaningful outputs. 

Regression 

The purpose of regression, a form of supervised learning, is to forecast a continuous numerical output value from a set of input data. In regression, the model learns to map input data to a continuous output variable, such as predicting a stock price or housing price based on features such as location, size, and age.

Classification

A different form of supervised learning is classification, which aims to forecast a categorical output variable from input features. In classification, the model learns to map input data to a set of discrete output categories, such as predicting whether an email is spam or not based on features such as sender, subject, and content.

Supervised learning algorithms 

These algorithms are computational methods used to build and train models to make accurate forecasts based on labeled data. Below are some standard supervised learning algorithms:

Neural networks

This machine-learning model utilizes multiple layers of connected nodes. These nodes learn to map input data and connect them to outputs in the form of a prediction. The overall process features forward propagation and backpropagation. The model looks similar and resembles the human brain in how it functions and the structure it has. Neural networks have shown impressive results in areas like natural language processing and recognizing images.

Decision trees

This tree-like structure is a helpful tool for making predictions based on input features. The input data within decision trees becomes repeatedly divided into smaller groups depending on the most useful properties until a prediction occurs at the tree’s leaf nodes. Applications for decision trees include credit risk assessment and medical diagnosis.

Bayesian logic

This probabilistic machine learning method uses Bayes' theorem to adjust a hypothesis' probability in light of fresh data. Uses and applications for Bayesian logic include finding conditional probabilities related to a client’s relative risk in a financial setting and calculating the accuracy of medical results.

Random forests

Multiple decision trees combine in random forests, an ensemble learning technique, to increase the predictions’ reliability and accuracy. These decision trees train on various subsets of the input data and characteristics to create random forests, which then combine the separate trees' predictions to produce a final prediction.

Linear discriminant analysis

Linear discriminant analysis (LDA) helps distinguish between and identify patterns between two data classes. It may also help classify multiple patterns. This statistical method finds a linear combination of features that best separates two or more classes in the input data. LDA is helpful in preprocessing and applications such as face recognition.

Similarity learning

Similarity learning involves training a model to learn a similarity function between pairs of input data. You can use the results to aid in techniques such as clustering and anomaly detection, where discerning the relationships or distances between data points is vital. Similarity learning is helpful in various applications, including product recommendation and facial recognition systems.

Examples of supervised learning

Many examples of supervised learning exist in various fields and industries. The following are some specific examples of supervised learning in the world today across a wide array of fields and usages:

Customer sentiment analysis

Customer sentiment analysis analyzes customer feedback, such as product reviews or social media posts, to determine the message's sentiment. The input data consists of text data, and the output labels are the sentiment categories, such as positive, negative, or neutral.

Regression algorithms, specifically logistic regression, can help predict “potential mental health crisis” posts on social media, for instance. Additional examples of applicable algorithms in this case include support vector machines, which are useful in dealing with nonlinear similarity, and neural networks, which can learn complex functions. 

Spam dedication

Spam dedication identifies spam emails by analyzing the content and track record of the sender of each email. Various machine learning and deep learning techniques have shown the ability to determine whether an email is spam or not. Examples of algorithms relevant for use in spam detection include K-Nearest Neighbor (KNN), deep convolutional neural networks (Deep CNN), and Naïve Bayes.

Predictive analytics

Predictive analytics leverages historical data to forecast what could happen in future instances. For example, a lender might use supervised learning to predict which customers will likely default on their loans based on factors like their credit history. The input data consists of historical data, and the output labels are binary values indicating whether the event of interest occurred. Models fall into three categories: classification, time series, and clustering.

Supervised vs. unsupervised learning

Supervised and unsupervised learning differ in how data becomes labeled and how the learning takes place. Below are some similarities and differences between the two:

Similarities:

  • Both learning algorithms train models to make predictions or discover patterns in data.

  • Both use statistical techniques and algorithms to gain insights from data sets to forecast or classify data.

  • Both share the ultimate goal of extracting meaningful insights from data.

Differences:

  • Unsupervised learning finds hidden trends within the data itself, whereas supervised learning is generally useful for forecasting future events based on historical data.

  • In supervised learning, a model trains on labeled data to create forecasts about new data. It differs from unsupervised learning, where the model trains on unlabeled data to discover patterns and relationships that would otherwise not be visible.

  • Logistic regression, support vector machines, and decision trees are all common algorithms that qualify as supervised learning. Unsupervised learning algorithms include clustering, dimensionality reduction, and association rule mining.

Next steps

From classifying objects in videos or images to identifying anomalies in data, supervised learning models address various business challenges with minimal human intervention or guidance. Learn more about supervised and machine learning by completing a course or receiving a relevant certificate. For example, check out Machine Learning Specialization by Stanford and DeepLearning.AI. This course covers fundamental skills and concepts related to machine learning and AI, allowing you to gain valuable experience to begin your career. The curriculum includes building machine learning and neural network models, training supervised models, and utilizing unsupervised learning processes. 

Another relevant course worth exploring to challenge yourself is the IBM Machine Learning Professional Certificate. Within a few months, this program can help equip you with the necessary skills to pursue roles in machine learning. You will learn the intricacies of various machine learning algorithms, how to properly train a neural network, and the basics of collaborative filtering, among other concepts. 

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.