跳到主要内容

3 篇博文 含有标签「decoder」

查看所有标签

· 阅读需 15 分钟
Gavin Gong

这是一篇讲各种各样解码器的论文。原论文(The Devil is in the Decoder: Classification, Regression and GANs)

由于“解码器(decoder,有些时候也被称为feature extractor)”的概念与像素级的分类、回归等问题多多少少都有瓜葛。以下是decoder被应用于像素级的任务:

  • 分类:语义分割、边缘检测。
  • 回归:人体关键点检测、深度预测、着色、超分辨。
  • 合成:利用生成对抗网络生成图像等。

所以decoder是稠密预测(Dence prediction,像素级别的很多问题都可以叫做稠密的)问题的关键。

Abstract(摘要)

Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 22 分钟
Gavin Gong

这是一篇关于数据依赖型解码器的理论和测试工作的论文。原论文是Decoders Matter for Semantic Segmentation : Data-Dependent Decoding Enables Flexible Feature Aggregation

近年来,常见的语义分割方法利用编码器-解码器结构进行逐像素的预测任务。在这些解码器每一层的最后通常是一层双线性上采样的过程,用于将像素恢复至原有像素大小。本论文的研究表明,这种与数据无关的双线性上采样方法可能会导致结果并不完美。

所以,本论文提出了一种依赖于输入数据的上采样取代双线性上采样,称为“DUpsampling”。这个新的方法利用在语义分段标签中的空间冗余,能够从低分辨率的CNN输出中恢复分辨率并实现逐像素预测。该方法在分辨率相对较低的输入上能获得更加精确的分割效果,并且显著降低了计算的复杂度。也就是说:

  • 这种新的上采样层重建能力非常强
  • 这种方法对任何CNN编码器的组合和使用表现出很好的兼容性

本论文还通过实验标明了,DUpsampling性能优越,并且无需任何后处理。

Abstract(摘要)

Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as 1/16 or 1/32 of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer's much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder's flexibility in leveraging almost arbitrary combinations of the CNN encoders' features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only 20% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context.

如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 10 分钟
Gavin Gong

DeepLab系列中包含了三篇论文:DeepLab-v1、DeepLab-v2、DeepLab-v3。

DeepLab-v1:Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

DeepLab-v2:Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLab-v3:Rethinking Atrous Convolution for Semantic Image Segmentation

在这里我们将这三篇放在一起阅读。

后来甚至还出现了后续:

DeepLab-v3+:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

不过暂时没有写进来的打算。