Dropout in Neural Networks: Enhancing Model Robustness

Written by Coursera Staff • Updated on

Explore the significance of dropout in neural networks and how it improves model generalization and other practical regularization applications in machine learning.

[Featured Image] Two machine learning specialists discuss dropout in neural networks as they look at a computer screen in a busy office setting.

Key takeaways

Dropout in neural networks is a machine learning technique that drops certain nodes from the framework in order to improve model learning.

  • Dropout is a method of machine learning regularization, which is the process of reducing machine learning error rates.

  • Utilizing dropout methods can improve image classification, algorithm performance, and speech recognition machine learning tasks. 

  • You can implement your own dropout rate algorithm using tools such as the PyTorch framework, which allows you to train your machine learning model for fault tolerance. 

Learn more about the theoretical foundations of dropout in machine learning, how it benefits machine learning processes, and how it compares to other machine learning regularization techniques. Or, start learning with the Deep Learning Specialization. In as little as three months, you can explore how to build and train deep neural networks, identify key architecture parameters, implement vectorized neural networks, and apply deep learning to applications. By the end, you’ll have earned a shareable certificate to add to your professional profile.

What is dropout in a neural network?

Neural network dropout is a machine learning technique in which a developer randomly drops certain nodes and their connections out of the learning framework to improve learning regularization. 

Modern machine learning occurs via neural networks, which are adaptation structures that resemble those of the human brain. Neural networks consist of an input layer, at least one hidden layer, and an output layer, each layer being composed of nodes. Nodes are artificial neurons that enable the transmission of data within a neural network. Utilizing these neural networks, machine learning, and artificial intelligence (AI) models input information, adapt to it, make mistakes in the process of gaining a fuller understanding of it, and then predict informational outcomes in a human-like way. 

Dropout is one method of machine learning regularization, which is the process of reducing learning error rates. During dropout, nodes and their connections randomly “drop out” of a neural network's input and output layers during the AI training process. Dropout is key to preventing overfitting, when a machine learning application absorbs too much noise and instills random and unhelpful fluctuations in data, leading to incorrect learning during the training process. A noise-filled, overfitted machine learning program learns to predict what it’s been taught but not to make new predictions based on that data. 

In short, dropout reduces the sort of machine learning overcomplication which, if not addressed, results in a model that hasn’t properly learned anything. 

Theoretical foundations of neural network dropout

It’s worth considering dropout in terms of ensemble learning. Ensemble learning is the technique of combining two or more different machine learning modalities, ostensibly resulting in better predictive performance. It’s kind of a two-heads-are-better-than-one theory.

The purpose of ensemble learning is to reduce error, which is described as the bias-variance tradeoff. This concept involves three elements: 

  • Bias: Bias signifies training errors describable as the gap between predicted and true values. The higher the bias in a machine learning algorithm, the less accurately a machine learning model can make predictions. You address bias via optimization, which increases an AI training model’s accuracy. 

  • Variance: Variance occurs when an AI model is too sensitive to small fluctuations in a training data set. In other words, a variance-heavy AI model picks up too much noise, thinking it’s the same as the meaningful data values you’re trying to train it on. This means it’s “thinking” based on error-ridden input data. Developers attempt to solve this problem via generalization, which refers to AI’s capacity to apply data it has already learned and output new, accurate information. 

  • Irreducible error: Large data sets contain an inherent amount of randomness, which results in further learning errors that can be difficult or impossible to reduce. 

What dropout does in terms of ensemble learning is reduce co-adaptation. Separate machine learning programs trained together on the same data may adapt similarly. These systems may adapt to accept statistical noise, meaning neither learns properly. Dropout is random, altering the hidden inputs of different machine learning models in the ensemble learning process, which results in each model developing different capabilities. This is similar to how people with different educational backgrounds and experiences may try to solve the same problem together, each bringing his or her own discrete skills to develop a solution. 

Read more: What Is Deep Reinforcement Learning? 

Does dropout improve accuracy?

Dropout generally improves the accuracy of machine learning algorithms. By turning off certain nodes during training, the machine learning algorithm has to go beyond memorizing data and instead learn patterns within the data. This helps to improve accuracy when moving beyond training data to work with new information.  

Practical implementation of a dropout neural network 

Machine learning via neural networks is an enormously complex autocorrect feature. An algorithm doesn’t “learn” the way a human being does. A machine learning system learns by inputting massive amounts of data, regularizing the learning process, and refining its output capabilities. For example, an ML system completes a sentence in a human-like way by determining the relative statistical likelihood of one word being the most likely to follow from the previous one.

You can picture this probability-based learning modality as a decision tree. Decision trees describe the actions certain root nodes can take via decision nodes, which lead to leaf nodes. Decision tree branches are lines or arrows showing the flow of a choice from a root to a decision node. A branch is the program asking if the answer to a root node’s question is yes or no, resulting in separate leaf nodes for each answer. 

The relative likelihood of roots, decisions, and leaves being logically connected is weighted. Weight is the numerical value assigned to the connections between nodes in a neural network. The higher the weight value, the greater the statistical probability that two nodes connect if the predicate of one leads logically to an appropriate conclusion in another. You randomize weights at first, but as learning continues, their values increase, resulting in greater prediction accuracy. 

Dropout occurs when programmers don’t see the accuracy they want from a training model. You determine dropout rates using hyperparameters, parameters set before the machine learning process begins. Specific hyperparameters to consider include:

  • Machine learning rate

  • Momentum, or how quickly an algorithm gets around noise to correct answers

  • Number of epochs, or how many times the AI will go through the entire data set

  • Number of decision tree branches 

Here are some tips for implementing dropout: 

  • Start with a 20 percent dropout rate; don’t go higher than 50 percent.

  • Go with a high learning rate momentum of between 0.9 and 0.99.

  • Use a learning weight of 4 or 5.

  • Utilize dropout on both the input and hidden layers.

The PyTorch framework allows you to train your machine learning model in fault tolerance. Fault tolerance is the idea that a machine learning program can learn to detect and even replace faulty information picked up by nodes without shutting down and rebooting. 

Read more: What Is PyTorch? 

Benefits of dropout in neural networks

Dropout removes data from a neural network to improve processing and ensure the network is not overloaded with data. This regularization method prevents overfitting at a low cost. Dropout improves: 

  • Image classification

  • Algorithm performance 

  • Speech recognition

Other regularization techniques

To optimize your machine learning performance, you may want to consider pairing dropout with other regularization techniques, such as: 

Data augmentation

During data augmentation, you increase the size of your data training set by working artificial data samples into the training process. By exposing your machine learning algorithm to a diversity of uncommon data in addition to your original data sets, you train it to adapt to data variations in a more sophisticated way. 

Early stopping

Early stopping allows you to pause the training process to eliminate irrelevant inputs. Instead of allowing an automated learning program to continue unchecked, which could lead to overfitting, early stopping halts training when machine learning performance starts to slip. 

Noise injection

Noise injection prevents overfitting by adding noise to the input data during training. Typically, noise is bad, but by artificially adjusting weights, you can train your machine learning model to be relatively insensitive to noise. Think of it in terms of exposure therapy, you’re introduced to a phobia by degrees, and by degrees, you learn not to be afraid of it. 

Explore career tips and emerging topics with our free resources

If you’d like to learn more about emerging topics before launching your career, consider subscribing to our LinkedIn newsletter, Career Chat. You can also explore more about neural networks and machine learning through our free resources below:

With Coursera Plus, you can learn and earn credentials at your own pace from over 350 leading companies and universities. With a monthly or annual subscription, you’ll gain access to over 10,000 programs—just check the course page to confirm your selection is included.

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.