跳到主要内容

5 篇博文 含有标签「survey」

查看所有标签

· 阅读需 15 分钟
Gavin Gong

这是一篇讲各种各样解码器的论文。原论文(The Devil is in the Decoder: Classification, Regression and GANs)

由于“解码器(decoder,有些时候也被称为feature extractor)”的概念与像素级的分类、回归等问题多多少少都有瓜葛。以下是decoder被应用于像素级的任务:

  • 分类:语义分割、边缘检测。
  • 回归:人体关键点检测、深度预测、着色、超分辨。
  • 合成:利用生成对抗网络生成图像等。

所以decoder是稠密预测(Dence prediction,像素级别的很多问题都可以叫做稠密的)问题的关键。

Abstract(摘要)

Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 102 分钟
Sonder

这是一篇神经对抗的综述文章,非常非常非常详细的介绍了当前神经对抗攻击的发展情况和已有的攻击和防御算法。原论文:Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Deep learning is at the heart of the current rise of machine learning and artificial intelligence. In the field of Computer Vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has lead to a large influx of contributions in this direction. This article presents the first comprehensive survey on adversarial attacks on deep learning in Computer Vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, we draw on the literature to provide a broader outlook of the research direction.

本文主要对文章进行翻译,还加入了个人对一些算法的理解与解释。这篇文章我大概看了一个星期。真的是一篇非常不错的综述论文。

· 阅读需 22 分钟
Gavin Gong

这是一篇关于数据依赖型解码器的理论和测试工作的论文。原论文是Decoders Matter for Semantic Segmentation : Data-Dependent Decoding Enables Flexible Feature Aggregation

近年来,常见的语义分割方法利用编码器-解码器结构进行逐像素的预测任务。在这些解码器每一层的最后通常是一层双线性上采样的过程,用于将像素恢复至原有像素大小。本论文的研究表明,这种与数据无关的双线性上采样方法可能会导致结果并不完美。

所以,本论文提出了一种依赖于输入数据的上采样取代双线性上采样,称为“DUpsampling”。这个新的方法利用在语义分段标签中的空间冗余,能够从低分辨率的CNN输出中恢复分辨率并实现逐像素预测。该方法在分辨率相对较低的输入上能获得更加精确的分割效果,并且显著降低了计算的复杂度。也就是说:

  • 这种新的上采样层重建能力非常强
  • 这种方法对任何CNN编码器的组合和使用表现出很好的兼容性

本论文还通过实验标明了,DUpsampling性能优越,并且无需任何后处理。

Abstract(摘要)

Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as 1/16 or 1/32 of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer's much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder's flexibility in leveraging almost arbitrary combinations of the CNN encoders' features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only 20% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context.

如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 33 分钟
Gavin Gong

这篇论文是一篇对动态神经网络的综述,原论文"Dynamic Neural Networks: A Survey"主要讲了:

  • 概念(Introduction)
  • 常见的动态神经网络
    • Instance-wise Dynamic Networks
    • Spatial-wise Dynamic Networks
    • Temporal-wise Dynamic Network
  • 推理和训练(Inference and Training)
  • 常见应用和代表性工作(Applications)

这篇论文对近些年吸引了很多研究者的动态神经网络进行了较为系统的总结概括。

Abstract(摘要)

Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) instance-wise dynamic models that process each instance with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.

动态神经网络近些年的相关研究逐渐变多,比起固定计算图的传统的静态神经网络,动态神经网络能够可以根据输入的具体数据调整它们的结构或是参数,同时在速度和精度方面占有优势。一种比喻是:“在输入较为简单时,动态神经网络可以很快;在输入较为复杂时,动态神经网络可以精度很高”。

这篇论文概括地介绍了动态神经网络是如何“动态”的,以及动态带来了怎样的优势。

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 15 分钟
Gavin Gong

这是一篇关于综述论文的解读。原论文(A Review on Deep Learning Techniques Applied to Semantic Segmentation)

摘要:

Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。