跳到主要内容

15 篇博文 含有标签「segmentation」

查看所有标签

· 阅读需 15 分钟
Gavin Gong

这是一篇讲各种各样解码器的论文。原论文(The Devil is in the Decoder: Classification, Regression and GANs)

由于“解码器(decoder,有些时候也被称为feature extractor)”的概念与像素级的分类、回归等问题多多少少都有瓜葛。以下是decoder被应用于像素级的任务:

  • 分类:语义分割、边缘检测。
  • 回归:人体关键点检测、深度预测、着色、超分辨。
  • 合成:利用生成对抗网络生成图像等。

所以decoder是稠密预测(Dence prediction,像素级别的很多问题都可以叫做稠密的)问题的关键。

Abstract(摘要)

Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 6 分钟
Zerorains

原论文:Progressive Semantic Segmentation

问题描述

当对大型图片进行语义分割时,可能会导致显存炸掉。收到内存限制,可以选择下采样,或将图像划分为局部块。但前者会丢失细节,后者会却反全局视图。

后处理改善分割细节

经典方法

条件随机场(CRF),引导滤波器(GF),两个速度慢,改进是渐进的。

深度学习的引导过滤器(DGF)可以提高推理速度

· 阅读需 10 分钟
Gavin Gong

DeepLab系列中包含了三篇论文:DeepLab-v1、DeepLab-v2、DeepLab-v3。

DeepLab-v1:Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

DeepLab-v2:Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLab-v3:Rethinking Atrous Convolution for Semantic Image Segmentation

在这里我们将这三篇放在一起阅读。

后来甚至还出现了后续:

DeepLab-v3+:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

不过暂时没有写进来的打算。

· 阅读需 6 分钟
Zerorains

论文名称:Cross-Dataset Collaborative Learning for Semantic Segmentation

作者:Li Wang, Dong Li, Yousong Zhu, Lu Tian, Yi Shan

期刊:CVPR2021

主要结构

DAB:Dataset-Aware Block(数据集感知块)

    作为网络的基本计算单元,有助于捕获o不同功能数据集之间的同质表示和异构统计。

主要由,一个数据集不变的卷积层,多个数据集特定的BatchNormal和一个激活层构成。

DAT:Dataset Alternation Training(数据集交替训练机制)

分割结果:

image-20210505160138997

· 阅读需 15 分钟
Gavin Gong

这是一篇关于综述论文的解读。原论文(A Review on Deep Learning Techniques Applied to Semantic Segmentation)

摘要:

Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 15 分钟
Zerorains

这是一篇讲解一种快速语义分割的论文。论文名:Fast-SCNN: Fast Semantic Segmentation Network

  • 主要是采用双流模型的架构设计这个网络
  • 本文总思路:减少冗余的卷积过程,从而提高速度

摘要:

The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024 × 2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our ‘learning to downsample’ module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

· 阅读需 18 分钟
Gavin Gong

image-20210601121147760

“我们希望预测分割图的边界区域更加准确,我们就不应该使用均匀采样,而应该更加倾向于图像边界区域。”

这是一篇用于改善图像分割问题中边缘分割效果的方法的论文的阅读笔记。该方法“将分割问题看作渲染问题”,达到了较好的效果。论文原文:PointRend: Image Segmentation as Rendering。在阅读这篇笔记之前,请确保先了解图像分割技术。对分割的技术进行简要的了解,可以参考另一篇笔记

Abstract(摘要)

We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network module: a module that performs point-based segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm. PointRend can be flexibly applied to both instance and semantic segmentation tasks by building on top of existing state-of-the-art models. While many concrete implementations of the general idea are possible, we show that a simple design already achieves excellent results. Qualitatively, PointRend outputs crisp object boundaries in regions that are over-smoothed by previous methods. Quantitatively, PointRend yields significant gains on COCO and Cityscapes, for both instance and semantic segmentation. PointRend's efficiency enables output resolutions that are otherwise impractical in terms of memory or computation compared to existing approaches. Code has been made available at this https URL.

· 阅读需 12 分钟
Zerorains

论文名称:RefineMask: Towards High-Quality Instance Segmentationwith Fine-Grained Features

作者:Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu

期刊:CVPR2021

代码:https://github.com/zhanggang001/RefineMask

原文摘要

The two-stage methods for instance segmentation, e.g.Mask R-CNN, have achieved excellent performance re-cently. However, the segmented masks are still very coarsedue to the downsampling operations in both the featurepyramid and the instance-wise pooling process, especiallyfor large objects. In this work, we propose a new methodcalled RefineMask for high-quality instance segmentationof objects and scenes, which incorporates fine-grained fea-tures during the instance-wise segmenting process in amulti-stage manner. Through fusing more detailed informa-tion stage by stage, RefineMask is able to refine high-qualitymasks consistently. RefineMask succeeds in segmentinghard cases such as bent parts of objects that are over-smoothed by most previous methods and outputs accurateboundaries. Without bells and whistles, RefineMask yieldssignificant gains of 2.6, 3.4, 3.8 AP over Mask R-CNN onCOCO, LVIS, and Cityscapes benchmarks respectively at asmall amount of additional computational cost. Further-more, our single-model result outperforms the winner of theLVIS Challenge 2020 by 1.3 points on the LVIS test-dev setand establishes a new state-of-the-art.

摘要

即使如Mask R-CNN这样二阶段的实例分割网路已经有了优秀的表现,但因为在特征金字塔和实例池化过程中使用了下采样操作,使得分割掩码仍然非常粗糙,尤其是对于大型物体。

在本文中,提出了RefineMask方法,用于对象和场景的高质量实例分割,它在实分割的过程中以多阶段的方式结合了细粒度特征。通过逐步融合更细节的信息,RefineMask能够始终如一地提炼出高质量的mask。

· 阅读需 10 分钟
Gavin Gong

BiSeNet的目标是更快速的实时语义分割。在语义分割任务中,空间分辨率和感受野很难两全,尤其是在实时语义分割的情况下,现有方法通常是利用小的输入图像或者轻量主干模型实现加速。但是小图像相较于原图像缺失了很多空间信息,而轻量级模型则由于裁剪通道而损害了空间信息。BiSegNet整合了Spatial Path (SP) 和 Context Path (CP)分别用来解决空间信息缺失和感受野缩小的问题。

Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048x1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

论文原文:BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation。阅读后你会发现,这篇论文有很多思路受到SENet(Squeeze-and-Excitation Networks)的启发。

· 阅读需 17 分钟
Gavin Gong

Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, Xiaolin Wei

image-20210719132305088

BiSeNet has been proved to be a popular two-stream network for real-time segmentation. However, its principle of adding an extra path to encode spatial information is time-consuming, and the backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image segmentation due to the deficiency of task-specific design. To handle these problems, we propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, the low-level features and deep features are fused to predict the final segmentation results. Extensive experiments on Cityscapes and CamVid dataset demonstrate the effectiveness of our method by achieving promising trade-off between segmentation accuracy and inference speed. On Cityscapes, we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher resolution images.

在阅读本文前,请先阅读BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

该论文提出BiSeNet被证明是不错的双路实时分割网络。不过,在BiSeNet中:

  • 单独为空间信息开辟一条网络路径在计算上非常的耗时
  • 用于spatial path的预训练轻量级骨干网络从其他任务中(例如分类和目标检测)直接拿来,用在分割上效率不很高。

因此,作者提出Short-Term Dense Concatenate network(STDC network)来代替BiSeNet中的context path。其核心内容是移除冗余的结构,进一步加速分割。具体来说,本文将特征图的维数逐渐降低,并将特征图聚合起来进行图像表征,形成了STDC网络的基本模块。同时,在decoder中提出Detail Aggregation module将空间信息的学习以single-stream方式集成到low-level layers中,用于代替BiSeNet中的spatial path。最后,将low-level features和deep features融合以预测最终的分割结果。

image-20210719100139212

注:上图中红色虚线框中的部分是新提出的STDC network;ARM表示注意力优化模块(Attention Refinement Module),FFM表示特征融合模块(Feature Fusion Module)。这两个模块是在BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation就已经存在的设计。

有兴趣请阅读原论文Rethinking BiSeNet For Real-time Semantic Segmentation