Activation Function in Neural Networks and Their Types

Written by Coursera Staff • Updated on

Explore the importance of activation functions in neural networks, and discover the different types of activation functions.

[Featured Image] A smiling data scientist is working on activation functions in neural networks at their computer in an office setting.

Neural networks are learning systems that utilise the connection of nodes to make predictions and understand data. The input layer comprises nodes, which store information about their inputs into the network. The output layer of the network is composed of nodes, which transform their internal state into a final output value. Activation functions take an input and alter it based on its output value. The output layer has one or more neurons that generate an output signal depending on the weights of each node within it.

Activation functions play a crucial role in artificial neural networks because they aid in understanding and learning about complex, nonlinear mappings between inputs and outputs.

Why do neutral networks need activation functions? 

Activation functions are necessary for any neural network to generate high-dimensional nonlinear patterns. Without them, the neural network output would be limited to linear or first-degree polynomial functions. Activation functions are applied to neurons to make them nonlinear, thus allowing them to process more complex data. The weights on each layer of that network are used to calculate the weighted total and then add bias. The bias is what differentiates an input from output.

Without activation functions, neural networks would be confined to a linear regression model, and performance and power would be significantly reduced. We employ activation functions in artificial neural network approaches like deep learning that allow us to understand complex, high-dimensional data sets where the model has several hidden layers.

Types of activation functions in neural networks

Three common activation functions of neural networks are binary step, linear, and nonlinear. Here's an overview of the three:

1. Binary step activation function 

This neural network depends on a threshold value that determines if the activation of a neuron should occur.

The activation function compares the input to a threshold; activation occurs if the input exceeds it; otherwise, the neuron gets deactivated, preventing its output from being passed on to the following hidden layer.

As the name suggests, this activation function is suitable for binary classifications but can not be used to deal with multiple classes.

The formula for the binary step function is:  

f(x) = 1, x >= 0 and f(x)= 0, x < 0

2. Linear activation function

The linear activation function is called the identity function. It just outputs the same value it was given, doing nothing to the weighted sum of the input (i.e., the output).

Mathematically, it is defined as:

f(x) = x

A linear activation function reduces the neural network to a single layer, but it requires all of its neurons to be connected, meaning back-propagation cannot be used for this type of neuron.

3. Nonlinear activation function

Nonlinear activation functions fix these significant linear activation function limitations:

  • The backpropagation algorithm is what enables deep learning to be so powerful. Using it, we can determine which input neurons have better predictions because they are connected to the derivative function, so their weights will change when their values don’t match our desired result.

  • The output is a nonlinear blend of the input through many layers, which enables the stacking of numerous layers of neurons. Any output produced by a neural network can be described as a result of the functional computations within the network.

Different types of nonlinear activation functions for neural networks

You can use several nonlinear activation functions when working with neural networks. Following is a description of eight types of nonlinear activation functions:

1. Sigmoid

This nonlinear activation function is one of the most frequently employed. The sigmoid function modifies values between 0 and 1. 

The formula for the sigmoid activation function is f(x) = 1/1+e-x.

The smooth S-shaped version of the sigmoid function gives the activation function of a neural network. It is used widely in machine learning, where predicting probability is the desired outcome.

2. Tanh

The tanh is a hyperbolic tangent transform, an operation that computes the hyperbolic tangent of an input. It is similar to sigmoid but is centred at the origin. As a result, the outputs from earlier layers have various signs, and these outputs—ranging from -1 to 1—are provided as input to the following layer.

The tanh nonlinear activation function is represented by tanh(x) = (e 2x − 1)/(e 2x + 1).

Tanh appears to have a steeper, deeper gradient than the sigmoid function. It is zero-centred and features gradients that are not constrained to move in a certain direction. Because of these qualities, it is often used for speech recognition and language processing tasks.

3. ReLU 

The rectified linear unit, or ReLU, is a nonlinear activation function frequently employed in neural networks. The advantage of utilising the ReLU function is that not every neuron is triggered simultaneously. If the result of the linear transformation is 0 or greater, then the neurons will stay on and active.

Mathematically, ReLU can be expressed as f(x) = max (0,x). 

Generally, ReLU performs better in deep learning than sigmoid and tanh because of its simplicity and quick learning.

4. Leaky ReLU 

Leaky ReLU is a modified and improved version of the ReLU function. The left side of the equation (x) would not be zero at negative values, and instead, it would be a very small linear component of y. This allows you to identify dead neurons in an area without fine-tuning many parameters that make zero crossings happen.

In mathematics, leaky ReLU can be written as f(x) = 0.01x, x < 0, and f(x) = x, x >= 0, or f(x) = max (0.1x,x).

5. Parametrised ReLU 

Parametrised ReLU (PReLU) is another version of the ReLU that performs better. By adding a new parameter to the negative component of the function—slope—the issue of the gradient of ReLU being zero for negative values of x is fixed.

It is mathematically expressed as f(x) = x, x >= 0, and f(x) = ax, x < 0. 

When the leaky ReLU function cannot send the necessary information to the following layer, the parameterised ReLU function addresses the issue of dead neurons.

6. Exponential linear unit 

The exponential linear unit (ELU) is another alternate version of ReLU. For negative x values, instead of having a slope of zero like ReLU, ELU adds a slope parameter. It defines negative values through a log curve with a straight line. This curve can reduce the background noise that the exponential function can cause.

It is mathematically defined as f(x) = x, x >= 0 and f(x) = a(ex – 1), x < 0.

ELU is a viable substitute for ReLU because ReLU becomes sharply smooth, and ELU becomes smooth gradually until its output equals -α.

Inserting a log curve for negative input values avoids the dead ReLU problem. This aids the network in shifting biases and weights in the desired direction.

7. Swish 

The Swish function is a newer activation function. It stands out because it is non-monotonic, implying that the function's value could decrease even when the input values rise. Google developed Swish, and it occasionally performs better than the ReLU function.

It is defined mathematically as f(x) = x * sigmoid(x) and f(x) = x/(1 – e-x )

8. Softmax 

The combination of several sigmoid curves forms a Softmax function. Since sigmoid functions often yield values between 0 and 1, they can represent the probability of data points belonging to a specific class. Instead of sigmoid functions, softmax functions can be employed for multiclass classification issues.

The output layer of the network will consist of the same quantity of neurons as the number of classes in the target when you create a network or model for multiclass classification.

Learn more with Coursera

To learn about neural networks in detail, you can opt for a course on Neural Networks and Deep Learning offered by DeepLearning.AI, which will help you understand the capabilities, challenges, and consequences of deep learning. Discover more about machine learning with the Machine Learning Specialization on Coursera.

Keep reading

Updated on
Written by:
Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.