Skip to main content

· One min read
Gavin Gong

Md Amirul Islam, Sen Jia, Neil D. B. Bruce

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. Information concerning absolute position is inherently useful, and it is reasonable to assume that deep CNNs may implicitly learn to encode this information if there is a means to do so. In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks. A comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

Comments: Accepted to ICLR 2020




· One min read

Mukund Sundararajan, Ankur Taly, Qiqi Yan

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.


本文介绍了一种神经网络的可视化方法:积分梯度(Integrated Gradients),是一篇 2016-2017 年间的工作。

所谓可视化,简单来说就是对于给定的输入 xx 以及模型 F(x)F(x),想办法指出 xx 的哪些分量对模型的预测有较大的影响,或者说 xx 各个分量的重要性做个排序,而专业的话术就是归因(Attribution)。一个朴素的思路是直接使用梯度 xF(x)\nabla _{x}F(x) 来作为 xx各个分量的重要性指标,而积分梯度是对它的改进。

· One min read

这是一篇讲解一种轻量级主干网络的论文。原论文(MobileNetV2: Inverted Residuals and Linear Bottlenecks)

  • 本文主要针对轻量特征提取网络中结构上的三个修改提高了网络性能。
  • 本文总思路:使用低维度的张量得到足够多的特征


In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and bench- marks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottle- neck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demon- strate that this improves performance and provide an in- tuition that led to this design. Finally, our approach allows decoupling of the in- put/output domains from the expressiveness of the trans- formation, which provides a convenient framework for further analysis. We measure our performance on ImageNet classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

· One min read

这是一篇讲解一种快速语义分割的论文。论文名:Fast-SCNN: Fast Semantic Segmentation Network

  • 主要是采用双流模型的架构设计这个网络
  • 本文总思路:减少冗余的卷积过程,从而提高速度


The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024 × 2048px) suited to efficient computation on embedded devices with low memory. Building on existing two-branch methods for fast segmentation, we introduce our ‘learning to downsample’ module which computes low-level features for multiple resolution branches simultaneously. Our network combines spatial detail at high resolution with deep features extracted at lower resolution, yielding an accuracy of 68.0% mean intersection over union at 123.5 frames per second on Cityscapes. We also show that large scale pre-training is unnecessary. We thoroughly validate our metric in experiments with ImageNet pre-training and the coarse labeled data of Cityscapes. Finally, we show even faster computation with competitive results on subsampled inputs, without any network modifications.

· One min read

这是一篇讲解一种轻量级主干网络的论文。原论文(MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications)

  • 本文提出了一种应用于移动或者嵌入式设备的高效神经网络
  • 本文提出了一种操作数较小的卷积模块深度可分离卷积(Depthwise Separable Convolution,以下称DSC)


We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

· One min read

论文名称:Gated Channel Transformation for Visual Recognition

作者:Zongxin Yang, Linchao Zhu, Y u Wu, and Yi Yang



  • GCT模块是一个普遍适用的门控转换单元,可与网络权重一起优化。
  • 不同于SEnet通过全连接的隐式学习,其使用可解释的变量显式地建模通道间的关系,决定是竞争或是合作。



  • 单个卷积层只对Feature Map中每个空间位置的临近局部上下文进行操作,这可能会导致局部歧义。通常有两种方法解决这种问题:一是增加网络的深度,如VGG,Resnet,二是增加网络的宽度来获得更多的全局信息,如GEnet大量使用领域嵌入,SEnet通过全局嵌入信息来建模通道关系。
  • 然而SEnet中使用fc层会出现两个问题:
    1. 由于使用了fc层,出于节省参数的考虑,无法在所有层上使用
    2. fc层的参数较为复杂,难以分析不同通道间的关联性,这实际上是一种隐式学习
    3. 放在某些层之后会出现问题

· One min read

论文名称:CBAM: Convolutional Block Attention Module

作者:Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon,Korea Advanced Institute of Science and Technology, Daejeon, Korea


  • CBAM(Convolutional Block Attention Moudule)是一种简单有效的前馈卷积神经网络注意力模块。
  • 该模块为混合域注意力机制()从通道和空间两个方面依次推断attention map。
  • CBAM是一个轻量级的通用模块,可以无缝集成到任何CNN中。



  • 卷积神经网络(CNNs)基于其丰富的表达能力显著提高了视觉任务的性能,目前的主要关注网络的三个重要因素:深度,宽度和基数(Cardinality)。
  • 从LeNet到残差网络,网络变的更加深入,表达形式更加丰富;GoogLeNet表明宽度是提高模型性能的另一个重要因素;Xception和ResNext则通过增加网络的基数,在节省参数的同时,来获得比深度、宽度更强的表达能力(引用于ResNext论文)。
  • 除了这些因素之外,本文考察了与网络结构设计不同的方面——注意力。

· One min read

论文名称:Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

作者:Bowen Cheng,Ross Girshick,Piotr Dollár,Alexander C. Berg,Alexander Kirillov



正如它的名字,Boundary IoU就是边界轮廓之间的IoU。

​ 重点为3.4节、5.1节,其他基本都是对比实验。


  • 提出了一种新的基于边界质量的分割评价方法——Boundary IoU;
  • Boundary IoU对大对象的边界误差比标准掩码IoU测量明显更敏感,并且不会过分惩罚较小对象的误差;
  • 比其他方法更适合作为评价分割的指标。

· One min read

论文名称:Involution: Inverting the Inherence of Convolution for Visual Recognition

作者:Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen, The Hong Kong University of Science and Technology, ByteDance AI Lab, Peking University, Beijing University of Posts and Telecommunications


  1. 空间无关性(spatial agnostic):same kernel for different position
    • 优点:参数共享,平移等变
    • 缺点:不能灵活改变参数,卷积核尺寸不能过大,只能通过堆叠来扩大感受野、捕捉长距离关系
  2. 通道特异性(channel specific):different kernels for different channels
    • 优点:充分提取不同通道上的信息
    • 缺点:有冗余

Convolution kernel 尺寸为 B,C_out,C_in,K,K



  1. 空间特异性:kernel privatized for different position
  2. 通道不变性:kernel shared across different channels

involution kernel 的尺寸为B,G,KK,H,W.

· One min read
Gavin Gong



这是一篇用于改善图像分割问题中边缘分割效果的方法的论文的阅读笔记。该方法“将分割问题看作渲染问题”,达到了较好的效果。论文原文:PointRend: Image Segmentation as Rendering。在阅读这篇笔记之前,请确保先了解图像分割技术。对分割的技术进行简要的了解,可以参考另一篇笔记


We present a new method for efficient high-quality image segmentation of objects and scenes. By analogizing classical computer graphics methods for efficient rendering with over- and undersampling challenges faced in pixel labeling tasks, we develop a unique perspective of image segmentation as a rendering problem. From this vantage, we present the PointRend (Point-based Rendering) neural network module: a module that performs point-based segmentation predictions at adaptively selected locations based on an iterative subdivision algorithm. PointRend can be flexibly applied to both instance and semantic segmentation tasks by building on top of existing state-of-the-art models. While many concrete implementations of the general idea are possible, we show that a simple design already achieves excellent results. Qualitatively, PointRend outputs crisp object boundaries in regions that are over-smoothed by previous methods. Quantitatively, PointRend yields significant gains on COCO and Cityscapes, for both instance and semantic segmentation. PointRend's efficiency enables output resolutions that are otherwise impractical in terms of memory or computation compared to existing approaches. Code has been made available at this https URL.