欢迎来到魔法部日志

2023年12月31日 · 阅读需 40 分钟

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

如果你看到了这里，说明你已经准备好开始探求这个领域事物的规律以及这些规律的本源了。作为一个本科生，最适合你入坑的就是开始习惯性阅读领域内论文。大约在我的大二的下半学期，我和我的朋友们开始共同阅读论文并写下笔记。这些笔记粗浅、幼稚，甚至会出现一些理解上的错误——万事开头难。但是我们还是想把这些笔记整理起来——这便是魔法部日志的开始。在我新建文件夹的时候，魔法部日志的文件夹名称是“unlimited paper works”，在成为理性的怀疑者之前，应该先掌握这个科研领域。我们做好了长期投入的准备，并希望把简单的事情做到出人意料得精彩。

加入魔法部日志也不是什么难事，你只需要热身一下，读完下面的一篇引导，就可以开始了(以下内容已通过语法检查工具PaperCube的检查)。

How to Read and Comprehend Scientific Research Articles

Scientific articles are how scholars and researchers communicate with each other. Reading scientific articles helps you to participate in your comprehension by wondering how the researchers explain their ideas. Books, websites, papers, scientific magazines are general places to start with.

This tutorial will discuss:

How to read a scientific article
How to find the main points of an article
How to take effective notes

The Devil is in the Decoder - Classification, Regression and GANs

2023年12月31日 · 阅读需 15 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

这是一篇讲各种各样解码器的论文。原论文（The Devil is in the Decoder: Classification, Regression and GANs）。

由于“解码器（decoder，有些时候也被称为feature extractor）”的概念与像素级的分类、回归等问题多多少少都有瓜葛。以下是decoder被应用于像素级的任务：

分类：语义分割、边缘检测。
回归：人体关键点检测、深度预测、着色、超分辨。
合成：利用生成对抗网络生成图像等。

所以decoder是稠密预测（Dence prediction，像素级别的很多问题都可以叫做稠密的）问题的关键。

Abstract（摘要）

Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques.

我看了这篇综述受益匪浅，如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

Threat of Adversarial Attacks on Deep Learning in Computer Vision - A Survey

2023年12月31日 · 阅读需 102 分钟

Sonder

life is but a span, I use python

这是一篇神经对抗的综述文章，非常非常非常详细的介绍了当前神经对抗攻击的发展情况和已有的攻击和防御算法。原论文：Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Deep learning is at the heart of the current rise of machine learning and artificial intelligence. In the field of Computer Vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas deep neural networks have demonstrated phenomenal success (often beyond human capabilities) in solving complex problems, recent studies show that they are vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrect outputs. For images, such perturbations are often too small to be perceptible, yet they completely fool the deep learning models. Adversarial attacks pose a serious threat to the success of deep learning in practice. This fact has lead to a large influx of contributions in this direction. This article presents the first comprehensive survey on adversarial attacks on deep learning in Computer Vision. We review the works that design adversarial attacks, analyze the existence of such attacks and propose defenses against them. To emphasize that adversarial attacks are possible in practical conditions, we separately review the contributions that evaluate adversarial attacks in the real-world scenarios. Finally, we draw on the literature to provide a broader outlook of the research direction.

本文主要对文章进行翻译，还加入了个人对一些算法的理解与解释。这篇文章我大概看了一个星期。真的是一篇非常不错的综述论文。

Progressive Semantic Segmentation

2023年12月31日 · 阅读需 6 分钟

Zerorains

life is but a span, I use python

原论文：Progressive Semantic Segmentation

问题描述

当对大型图片进行语义分割时，可能会导致显存炸掉。收到内存限制，可以选择下采样，或将图像划分为局部块。但前者会丢失细节，后者会却反全局视图。

后处理改善分割细节

经典方法

条件随机场(CRF),引导滤波器（GF），两个速度慢，改进是渐进的。

深度学习的引导过滤器(DGF)可以提高推理速度

Decoders Matter for Semantic Segmentation - Data-Dependent Decoding Enables Flexible Feature Aggregation

2023年12月31日 · 阅读需 22 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

这是一篇关于数据依赖型解码器的理论和测试工作的论文。原论文是Decoders Matter for Semantic Segmentation : Data-Dependent Decoding Enables Flexible Feature Aggregation。

近年来，常见的语义分割方法利用编码器-解码器结构进行逐像素的预测任务。在这些解码器每一层的最后通常是一层双线性上采样的过程，用于将像素恢复至原有像素大小。本论文的研究表明，这种与数据无关的双线性上采样方法可能会导致结果并不完美。

所以，本论文提出了一种依赖于输入数据的上采样取代双线性上采样，称为“DUpsampling”。这个新的方法利用在语义分段标签中的空间冗余，能够从低分辨率的CNN输出中恢复分辨率并实现逐像素预测。该方法在分辨率相对较低的输入上能获得更加精确的分割效果，并且显著降低了计算的复杂度。也就是说：

这种新的上采样层重建能力非常强
这种方法对任何CNN编码器的组合和使用表现出很好的兼容性

本论文还通过实验标明了，DUpsampling性能优越，并且无需任何后处理。

Abstract（摘要）

Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as 1/16 or 1/32 of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer's much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder's flexibility in leveraging almost arbitrary combinations of the CNN encoders' features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only 20% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context.

如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

HLA-Face Joint High-Low Adaptation for Low Light Face Detection

2023年12月31日 · 阅读需 16 分钟

PommesPeter

I want to be strong. But it seems so hard.

这是一篇讲低光照人脸检测的论文。原论文（HLA-Face Joint High-Low Adaptation for Low Light Face Detection）。

充分利用现有的正常光数据，并探索如何将面部探测器从正常光线调整到低光。这项任务的挑战是，正常和低光之间的差距对于像素级和物体级别来说太大而复杂。因此，大多数现有的lowlighenhance和适应方法不达到所需的performance。
本文是DARK FACE为基准，针对现有的正常照度图像，将图像调整成低照度图像，不需要标签
一个是像素级外观的差距，例如不足，照明，相机噪声和颜色偏置。另一个是正常和低光场景之间的物体级语义差异，包括但不限于路灯的存在，车辆前灯和广告板。传统的低光增强方法[5,6]设计用于提高视觉质量，因此不能填充语义差距，
通过使低光图像亮起并扭曲正常光图像，我们构建位于正常和低光之间的中间状态。

摘要:

Face detection in low light scenarios is challenging but vital to many practical applications, e.g., surveillance video, autonomous driving at night. Most existing face detectors heavily rely on extensive annotations, while col- lecting data is time-consuming and laborious. To reduce the burden of building new datasets for low light condi- tions, we make full use of existing normal light data and explore how to adapt face detectors from normal light to low light. The challenge of this task is that the gap between normal and low light is too huge and complex for both pixel-level and object-level. Therefore, most existing low- light enhancement and adaptation methods do not achieve desirable performance. To address the issue, we propose a joint High-Low Adaptation (HLA) framework. Through a bidirectional low-level adaptation and multi-task high- level adaptation scheme, our HLA-Face outperforms state- of-the-art methods even without using dark face labels for training. Our project is publicly available at: [https: //daooshee.github.io/HLA-Face-Website/](https: //daooshee.github.io/HLA-Face-Website/)

DeepLab Series

2023年12月31日 · 阅读需 10 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

DeepLab系列中包含了三篇论文：DeepLab-v1、DeepLab-v2、DeepLab-v3。

DeepLab-v1：Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

DeepLab-v2：Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLab-v3：Rethinking Atrous Convolution for Semantic Image Segmentation

在这里我们将这三篇放在一起阅读。

后来甚至还出现了后续：

DeepLab-v3+：Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

不过暂时没有写进来的打算。

Cross-Dataset Collaborative Learning for Semantic Segmentation

2023年12月31日 · 阅读需 6 分钟

Zerorains

life is but a span, I use python

论文名称：Cross-Dataset Collaborative Learning for Semantic Segmentation
作者：Li Wang, Dong Li, Yousong Zhu, Lu Tian, Yi Shan
期刊：CVPR2021

主要结构

DAB：Dataset-Aware Block(数据集感知块)

    作为网络的基本计算单元，有助于捕获o不同功能数据集之间的同质表示和异构统计。

    主要由，一个数据集不变的卷积层，多个数据集特定的BatchNormal和一个激活层构成。

DAT：Dataset Alternation Training(数据集交替训练机制)

分割结果：

Dynamic Neural Networks - A Survey

2023年12月31日 · 阅读需 33 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

这篇论文是一篇对动态神经网络的综述，原论文"Dynamic Neural Networks: A Survey"主要讲了：

概念（Introduction）
常见的动态神经网络
- Instance-wise Dynamic Networks
- Spatial-wise Dynamic Networks
- Temporal-wise Dynamic Network
推理和训练（Inference and Training）
常见应用和代表性工作（Applications）

这篇论文对近些年吸引了很多研究者的动态神经网络进行了较为系统的总结概括。

Abstract（摘要）

Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) instance-wise dynamic models that process each instance with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.

动态神经网络近些年的相关研究逐渐变多，比起固定计算图的传统的静态神经网络，动态神经网络能够可以根据输入的具体数据调整它们的结构或是参数，同时在速度和精度方面占有优势。一种比喻是：“在输入较为简单时，动态神经网络可以很快；在输入较为复杂时，动态神经网络可以精度很高”。

这篇论文概括地介绍了动态神经网络是如何“动态”的，以及动态带来了怎样的优势。

我看了这篇综述受益匪浅，如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

Feature Pyramid Networks for Object Detection

2023年12月31日 · 阅读需 11 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

这篇笔记的写作者是VisualDust。

原论文Feature Pyramid Networks for Object Detection。

这篇论文就是大家熟知的FPN了。FPN是比较早期的一份工作（请注意，这篇论文只是多尺度特征融合的一种方式。不过这篇论文提出的比较早（CVPR2017），在当时看来是非常先进的），在当时具有很多亮点：FPN主要解决的是物体检测中的多尺度问题，通过简单的网络连接改变，在基本不增加原有模型计算量情况下，大幅度提升了小物体检测的性能。

Abstract（摘要）

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

这篇论文对以后的许多网络设计产生了较大的影响，推荐你阅读原文。这里只是对这篇论文的粗浅阅读笔记。

How to Read and Comprehend Scientific Research Articles​

Abstract（摘要）

问题描述​

后处理改善分割细节​

经典方法​

Abstract（摘要）​

主要结构​

Abstract（摘要）​

这篇笔记的写作者是VisualDust。​

Abstract（摘要）​

How to Read and Comprehend Scientific Research Articles

问题描述

后处理改善分割细节

经典方法

Abstract（摘要）

主要结构

Abstract（摘要）

这篇笔记的写作者是VisualDust。

Abstract（摘要）