Semantic segmentation is defined, explained, and compared to other image segmentation techniques in this article.
If you’ve ever used a filter on Instagram or TikTok, you’ve employed semantic segmentation from the palm of your hand. But this computer vision technique goes far beyond digital makeup and mustaches. You’ll find it hard at work in hospitals, farms, and even Teslas. In the following article, you’ll learn more about how semantic segmentation works, its importance, and how to do it yourself.
Semantic segmentation identifies, classifies, and labels each pixel within a digital image. Pixels are labeled according to the semantic features they have in common, such as color or placement. Semantic segmentation helps computer systems distinguish between objects in an image and understand their relationships. It’s one of three subcategories of image segmentation, alongside instance segmentation and panoptic segmentation.
Instance segmentation expands upon semantic segmentation by assigning class labels and differentiating between individual objects within those classes.
Example:
Semantic segmentation | Instance segmentation |
---|---|
Dogs | Yellow dog, brown dog |
Panoptic segmentation is a hybrid technique combining semantic and instance segmentation for a unified, interpreted view; hence, the prefix pan, meaning “all.” The panoptic segmentation process places objects into the following two categories:
Things. In the context of computer vision, “things” are quantifiable objects with defined shapes, for example, vehicles, people, animals, and trees.
Stuff. “Stuff” describes objects lacking defined shapes that computer vision can identify by material or texture. Examples include bodies of water, mountain ranges, and the sky.
Image classification can be a form of supervised machine learning, depending on the case. Image classification models may be trained to recognize objects in images using labeled example photos. This process initially depended upon raw pixel data. However, this data type is prone to uncorrectable fluctuations caused by camera focus, lighting, and angle variations. Introducing a convolutional neural network (CNN) to this process made it possible for models to extract individual features and deduce what objects they represent.
Semantic models take this approach a step further. After passing input images through the neural network architecture, semantic segmentation models create a color-coded map wherein each color represents a different class label. These defined spatial features help computers identify boundaries between different objects and distinguish between background and foreground focus items.
1. Classification. Pixels in an image are assigned a class label representing particular objects.
2. Localization. Objects are outlined with a bounding box. A bounding box is a line drawn around the perimeter of an object.
3. Segmentation. In the localized image, pixels are grouped using a segmentation mask. A segmentation mask reduces noise by separating one portion of an image from the rest. One way to visualize segmentation masking is to imagine sliding a piece of black construction paper with a hole cut out over an image to isolate specific portions.
Photography and social media filters. All commonly used camera effects and filters on social media applications like Instagram and TikTok rely on semantic segmentation. For example, it identifies the placement of eyes to apply sunglasses. Semantic segmentation also allows cameras to switch between landscape and portrait formats.
Medical imaging analyses. AI segmentation models trained on medical imagery can perform automated analysis to measure and detect anomalies on a pixel level. By highlighting and mapping anatomical features, segmentation enhances visualization for more precise identification of tumors and other irregularities.
Agriculture. Farmers employ AI and semantic segmentation to automate maintenance and manage the health of their crops. Computer vision technology helps farmers quickly detect at-risk portions of their fields to eradicate pests or contain infections.
Self-driving cars. Autonomous vehicles rely heavily on semantic segmentation to identify obstacles, analyze road conditions, and map surroundings.
course
Understanding the clinical terms and abbreviations commonly used during verbal or written communication in U.S. hospitals is challenging. This course is ...
4.8
(3,139 ratings)
127,574 already enrolled
Beginner level
Average time: 33 hour(s)
Learn at your own pace
Skills you'll build:
Communication
Many different tools and models exist that you can use to perform semantic segmentation. If you’d like step-by-step guidance throughout your project, consider the Semantic Segmentation with Amazon Sagemaker Guided Project on Coursera. You’ll visualize and prepare data for model training via a split-screen web browser environment. To complete this advanced-level project, experience with Python programming, deep learning concepts, and AWS is required. Consider the resources in the following sections if you want to start a semantic segmentation project independently.
guided project
This course presents the different customer interactions that happen in a retail setting and allows you to experience real interactions through simulations ...
4.8
(51 ratings)
4,517 already enrolled
Beginner level
Average time: 21 hour(s)
Learn at your own pace
Data sets for semantic segmentation are typically huge and complex. The more diverse labels in the data set, the better the model can learn. Here are a few commonly used segmentation data sets:
Microsoft Common Objects in Context (MS COCO). MS COCO is a large-scale data set used for captioning, key-point detection, object detection, and segmentation. It includes over 320,000 images with a wide variety of annotations having been refined by community feedback.
Cityscapes Dataset. The central focus of this data set is the semantic understanding of city and street scenes. It includes 30 different classes, 25,000 annotated images, dense semantic segmentation, and instance segmentation for people and vehicles.
ScanNet. ScanNet is an RGB-D video data set with 2D and 3D data. It comprises 2.5 million indoor views in 1,513 scenes with semantic segmentation annotations and surface reconstructions.
Semantic segmentation models are used to classify objects in images. The list below includes a few popular segmentation models:
Pyramid Scene Parsing Network (PSPNet). PSPNet uses a pyramid parsing module to discern multi-level features for a more comprehensive context of an image. It’s capable of processing global and local information.
Fully Convolutional Network (FCN). FCNs have notably less dense layers than traditional CNNs, shortening the training process.
SegNet. SegNet is a semantic segmentation model comprising an encoder network, a decoder network, and a classification layer.
If you’re new to the field of computer vision, consider enrolling in an online course like Image Processing for Engineering and Science Specialization from MathWorks. You’ll gain a foundational understanding of image processing and analyzing techniques.
DeepLearning.AI offers an intermediate-level course, Advanced Computer Vision with TensorFlow, to build upon your existing knowledge of image segmentation using TensorFlow.
If you’re ready to dive straight into a semantic segmentation project, the Guided Project Semantic Segmentation with Amazon Sagemaker walks you through the entire process.
specialization
Launch your career in data analytics. Build job-ready skills – and must-have AI skills – for an in-demand career. Earn a credential from Meta in 5 months or less. No degree or prior experience required.
4.7
(619 ratings)
30,181 already enrolled
Beginner level
Average time: 5 month(s)
Learn at your own pace
Skills you'll build:
SQL, Pandas, Generative AI in Data Analytics, Data Analysis, Python Programming, Marketing, Data Management, Data Visualization, Linear Regression, Statistical Analysis, Statistical Hypothesis Testing, Spreadsheet, Tableau Software
course
Inappropriate use of medicines harms patients' health and increases healthcare costs. When healthcare professionals and patients engage together, healthcare ...
4.8
(164 ratings)
8,369 already enrolled
Intermediate level
Average time: 14 hour(s)
Learn at your own pace
Skills you'll build:
Communication
guided project
This course presents the different customer interactions that happen in a retail setting and allows you to experience real interactions through simulations ...
4.8
(51 ratings)
4,517 already enrolled
Beginner level
Average time: 21 hour(s)
Learn at your own pace
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.
Whether you're starting your career or trying to advance to the next level, experts at Google are here to help.
Build Agile skills to stay organized and complete projects faster.