Sigmoid Activation Function: Deep Learning Basics

Written by Coursera Staff • Updated on Feb 11, 2025

Enter the exciting world of deep learning by learning about the sigmoid activation function, a common algorithm used in binary classification problems. Explore advantages, limitations, and how to decide if it is right for you.

[Featured Image] Male businessman working in artificial intelligence uses work computer to study the sigmoid activation function — .

In the field of deep learning and artificial intelligence, choosing the right activation function is important to ensure you are finding the best insights from your data. By learning about common types, such as the sigmoid activation function, you can expand your expertise in building your model and opening potential opportunities in the machine learning space. In this article, we will explore what deep learning is, the role the sigmoid activation function plays in neural networks, and how to know if it is the right function for your needs.

What is deep learning?

Deep learning is an artificial intelligence and machine learning method designed to learn and process in similar ways to humans. This algorithm represents an artificial neural network modeled after pathways in the brain in the form of interconnected layers of artificial neurons known as nodes.

Deep learning is “deep” compared to other machine learning methods due to the extensive data transformation process, which often includes thousands of layers. Within the neural network, you will find several different types of layers: an input layer that receives the data, several hidden layers that compute the operations, and an output layer that delivers the final result. Each layer consists of nodes, or “neurons,” each performing simple computations. This complex network of data decisions and processing allows deep learning algorithms to perform well in complex fields, such as natural language processing, recommendation personalization, autonomous vehicle technology, and health care diagnostics.

Activation functions in deep learning

In a neural network within your brain, neurons decide whether to fire based on whether the incoming signal surpasses a certain threshold. In artificial neural networks, activation functions determine whether a neuron should activate or not, deciding what path to choose for the information. To do this, artificial neurons apply an activation function to the sum of their weighted inputs. Through this, activation functions determine the flow of information.

One way activation networks function is by introducing non-linear properties to the network. This non-linear component enables the algorithm to learn and perform more complex tasks beyond what a simple linear regression could do.

Without these functions, the neural network would essentially work as a linear regression model, and the thousands of neurons representing potential pathways would not make sense. There would be no “decision” about the pathway information flows through, making the model unable to learn from complex or non-linear data, constituting a significant portion of real-world data.

What is a sigmoid activation function?

The sigmoid activation function is a widely used nonlinear function in neural networks, often chosen for binary classification problems. Mathematically, you can represent this function as 1 / (1 + e^(-x)). In this equation, x represents the input, and e is Euler’s number.

A sigmoid smoothly varies from zero to one, characterized by an “S” shaped curve. The sigmoid function maps any real-valued number into a value between 0 and 1, making it particularly useful to know the probability of a certain outcome (zero being no change, one being 100 percent certainty).

Sigmoid activation function and backpropagation

In neural network training, backpropagation is a central concept that allows the model to learn. During training, your model will produce a predicted output, and in some cases (e.g., supervised learning), you can check this output against the correct or actual value. Once the algorithm compares its predicted algorithm with the actual algorithm, it might need to go back and make changes to fine-tune the neural pathways. This involves adjusting the network's weights to minimize the difference between the predicted output and the actual output.

During backpropagation, the formula calculates the gradient (or derivative) of the activation function at each point in time, and this gradient value updates the weights in the network. For the sigmoid function, the derivative value ranges between 0 and 0.25, influencing how weights update.

Limitation of sigmoid function for backpropagation

One commonly found issue when using the sigmoid activation function in backpropagation is known as the “vanishing gradient problem.” When the input to the sigmoid function is extreme, such as very high or very low, the value of the gradient slowly nears zero. When backpropagating through many layers of a deep network, this small gradient diminishes to the point where it has little to no effect in updating the weights of earlier layers. When this happens, the neural network cannot make any more updates, and the model ceases to learn.

Examples of sigmoid activation functions

Professionals in diverse fields use sigmoid activation functions to classify inputs in one category or another. One of the most common uses of the sigmoid function is in logistic regression models. In logistic regression, the sigmoid function converts the linear regression output into a probability, indicating a binary outcome's likelihood (e.g., yes or no, 1 or 0).

In natural language processing, sigmoid activation can produce a model that determines a text's sentiment, categorizing it as positive or negative.

In health care models predicting the likelihood of a disease (such as cancer or heart disease), the sigmoid activation function can output the probability of the disease's presence.

In a neural network designed for binary classification, such as determining if an email is spam or not, the final output layer often uses a sigmoid function to represent the probability of one class over another.

Find the right activation function for you.

For binary classification problems, especially when the output is required in the form of probabilities, the sigmoid function is a natural choice thanks to its zero to one output range. However, if you are working with multiple classifications—known as multi-class classification—you might prefer the softmax activation function, which extends the concept of sigmoid to multiple classes.

If the vanishing gradient problem is a concern, one solution is using a rectified linear unit (ReLU) rather than a sigmoid activation function. The ReLU does not experience a diminishing derivative, which helps maintain a positive value throughout the backpropagation process. However, every algorithm will have its own disadvantages, and using ReLU fundamentally changes your type of approximation algorithm. To find what’s best for you, take the time to learn about each method and how it might fit with your data.

Explore deep learning further on Coursera.

Comprehensive courses and specializations on the Coursera learning platform can help you expand your knowledge of exciting deep learning topics and methods. If you want a broad overview of topics offered at your own pace, the Deep Learning Specialization by DeepLearning.AI is a great way to explore different areas and identify topics you might want to pursue further.

Updated on Feb 11, 2025

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.