The Tanh Activation Function in Deep Learning

Written by Coursera Staff • Updated on Feb 12, 2025

Learn about the tanh activation function and the role it plays in machine learning and artificial intelligence. Explore features, limitations, and common applications.

[Featured Image] Male businessman using a tablet explains tanh activation function to coworkers in breakroom — .

The tanh activation function is a popular algorithm used in the world of neural networks and deep learning. This function closely relates to the sigmoid activation function but has several differentiating factors that speak to its high performance and diverse applications. In this article, we will review deep learning and neural networks, explore the tanh activation function and its uses, and review indicators that the tanh function might be right for you.

Deep learning and neural network basics

Deep learning is a machine learning algorithm that uses artificial neural networks to mimic how human brains operate. When you have a thought or make a decision, the neurons in your brain fire in a particular pattern. This firing pattern differs depending on your actions and thoughts. For example, when excited, your neurons will fire in a different pathway than if you are angry. This system of neural pathways allows humans to have complex thoughts, learn, and reason through different ideas, which has historically differentiated us from machines.

In recent years, advances in machine learning have made it possible to mimic this neural network to enable machines to learn and independently process information to make insights. This type of algorithm uses large data sets to recognize patterns, make predictions, and guide decision-making with insights. This has led to the use of this technology in fields such as text processing, image analysis, autonomous driving, speech recognition, and research.

The role of activation functions

One key aspect of a neural network is that information can travel through several possible paths. To decide this path, an activation function decides whether a neuron (node) will “fire” or not. This allows neural networks to compute highly complex decisions by determining which neurons should activate, influencing how the signal flows from input to output. Without these functions, a neural network would essentially become a linear regression model incapable of learning complex or non-linear relationships in the data.

To do this, you have several options for your function, including the sigmoid activation function, ReLU activation function, softmax activation function, and tanh activation function. Deciding the right activation path to train your model with will depend on which method's advantages and limitations best suit your goals.

What is the tanh activation function?

The tanh, or hyperbolic tangent, activation function is a nonlinear function that outputs values in the range of (-1, 1). Mathematically, you can represent this function as:

e^z - e^(-z) / e^z + e^(-z)

The tanh activation function is similar to the sigmoid activation function, which is another activation function that uses a sigmoid curve (an S-shaped curve) to model distributions. One difference here is that the sigmoid activation function models the probability output ranging from (0,1), while the tanh function spans (-1,1). This differentiation is due to the tanh function being centered around zero, which can sometimes accelerate the learning of neural network models. To transform a sigmoid function to a tanh function, you can shift the function by a certain parameter value.

Hard tanh activation function

The hard tanh activation function uses a more limited sigmoid curve to reduce computational load and improve speed. If you are prioritizing cost, the hard tanh function might be a choice you should consider. However, this function saturates with input values above one, which may pose a problem depending on your data.

Limitations of tanh activation function

The tanh activation function has similar limitations to the sigmoid function, including trouble with backpropagation. Backpropagation is the process in deep learning where a model retraces its flow through the network to see where it can improve the output. This involves reweighting each node to alter activation pathways and involves taking a derivative at each step.

Tanh functions do not always backpropagate effectively when the value of the derivative reaches zero prematurely (known as the vanishing gradient problem), which stops the network's learning process. However, the tanh model struggles less with vanishing gradients when combined with ReLU models to create the ReLTanh activation function.

When to use the tanh activation function

The tanh activation function is particularly suited for several applications. It may particularly suit your needs if:

You have zero-centered data.

Since tanh outputs range from negative one to one (-1, 1) and are zero-centered, you might choose to use them with layers where having data centered around zero can accelerate learning.

You want to minimize the risk of a vanishing gradient.

Though you can still encounter the vanishing gradient problem with tanh functions in very deep networks, it is generally preferred over the sigmoid function in the hidden layers due to its efficiency in propagating gradients. The derivative of the tanh function is 1 - tanh^2(z), which means the gradient will be between zero and one (0, 1). The tanh function’s gradient generally stays larger than the gradient of the sigmoid function, making it less prone to this issue.

You want your model to converge quickly.

If you are using a sigmoid activation function and experience convergence issues, the tanh function might provide a solution to your problem. Tanh activation functions produce similar results to sigmoid functions while converging more quickly. This is because the output of tanh is zero-centered, meaning the weight updates have a directionally more consistent influence on the adjustments during training, often leading to faster convergence.

You are working with a long short-term memory (LSTM) network.

This type of network is a type of recurrent neural network (RNN) designed to classify time series data while protecting against the vanishing gradient problem. Tanh models have typically been a go-to algorithm when working with LSTM models. Tanh activations have shown high accuracy with this network type in training and testing data.

Explore on Coursera

The tanh activation function is only the beginning. On the Coursera learning platform, you can explore a diverse range of machine learning, artificial intelligence, and deep learning concepts with expansive courses designed by academic leaders and industry professionals.

If you want to learn the basics of deep learning at your own pace, the Deep Learning Specialization by DeepLearning.AI is a course series that provides a broad overview of fundamental deep learning concepts. This can introduce you to modern methods and provide a platform to identify areas of particular interest and build skills needed for more advanced coursework.

Keep reading

Updated on Feb 12, 2025

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.