## How much Position Information Do Convolutional Neural Networks Encode?

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. Information concerning absolute position is inherently useful, and it is reasonable to assume that deep CNNs may implicitly learn to encode this information if there is a means to do so. In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks. A comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

Comments: Accepted to ICLR 2020

## Axiomatic Attribution for Deep Networks

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

## MobileNetV2 - Inverted Residuals and Linear Bottlenecks

• 本文主要针对轻量特征提取网络中结构上的三个修改提高了网络性能。
• 本文总思路：使用低维度的张量得到足够多的特征

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and bench- marks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottle- neck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demon- strate that this improves performance and provide an in- tuition that led to this design. Finally, our approach allows decoupling of the in- put/output domains from the expressiveness of the trans- formation, which provides a convenient framework for further analysis. We measure our performance on ImageNet classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

## Fast-SCNN - Fast Semantic Segmentation Network

• 主要是采用双流模型的架构设计这个网络
• 本文总思路：减少冗余的卷积过程，从而提高速度

The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024 × 2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our ‘learning to downsample’ module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

## MobileNets - Efficient Convolutional Neural Networks for Mobile Vision Applications

• 本文提出了一种应用于移动或者嵌入式设备的高效神经网络
• 本文提出了一种操作数较小的卷积模块深度可分离卷积(Depthwise Separable Convolution，以下称DSC)

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

## 摘要​

• GCT模块是一个普遍适用的门控转换单元，可与网络权重一起优化。
• 不同于SEnet通过全连接的隐式学习，其使用可解释的变量显式地建模通道间的关系，决定是竞争或是合作。

## 介绍​

• 单个卷积层只对Feature Map中每个空间位置的临近局部上下文进行操作，这可能会导致局部歧义。通常有两种方法解决这种问题：一是增加网络的深度，如VGG，Resnet，二是增加网络的宽度来获得更多的全局信息，如GEnet大量使用领域嵌入，SEnet通过全局嵌入信息来建模通道关系。
• 然而SEnet中使用fc层会出现两个问题：
1. 由于使用了fc层，出于节省参数的考虑，无法在所有层上使用
2. fc层的参数较为复杂，难以分析不同通道间的关联性，这实际上是一种隐式学习
3. 放在某些层之后会出现问题

## 摘要​

• CBAM（Convolutional Block Attention Moudule)是一种简单有效的前馈卷积神经网络注意力模块。
• 该模块为混合域注意力机制（）从通道和空间两个方面依次推断attention map。
• CBAM是一个轻量级的通用模块，可以无缝集成到任何CNN中。

## 介绍​

• 卷积神经网络(CNNs)基于其丰富的表达能力显著提高了视觉任务的性能，目前的主要关注网络的三个重要因素：深度，宽度和基数（Cardinality）。
• 除了这些因素之外，本文考察了与网络结构设计不同的方面——注意力。

## Boundary IoU - Improving Object-Centric Image Segmentation Evaluation

​ 重点为3.4节、5.1节，其他基本都是对比实验。

# 摘要

• 提出了一种新的基于边界质量的分割评价方法——Boundary IoU；
• Boundary IoU对大对象的边界误差比标准掩码IoU测量明显更敏感，并且不会过分惩罚较小对象的误差；
• 比其他方法更适合作为评价分割的指标。

# Convolution

1. 空间无关性(spatial agnostic)：same kernel for different position
• 优点：参数共享，平移等变
• 缺点：不能灵活改变参数，卷积核尺寸不能过大，只能通过堆叠来扩大感受野、捕捉长距离关系
2. 通道特异性(channel specific)：different kernels for different channels
• 优点：充分提取不同通道上的信息
• 缺点：有冗余

Convolution kernel 尺寸为 B,C_out,C_in,K,K

# Involution

1. 空间特异性：kernel privatized for different position
2. 通道不变性：kernel shared across different channels

involution kernel 的尺寸为B,G,KK,H,W.