5 篇博文含有标签「inductive-bias」

GLADNet - Low-Light Enhancement Network with Global Awareness

2023年12月31日 · 阅读需 9 分钟

I want to be strong. But it seems so hard.

论文名称: Low-Light Enhancement Network with Global Awareness
论文作者: Wenjing Wang, Chen Wei, Wenhan Yang, Jiaying Liu
Code: https://github.com/weichen582/GLADNet

这是一篇讲解使用神经网络进行低照度增强的论文。

先对图像的光照进行估计，根据估计的结果来调整原图像
调整过程中会对图像中的细节重构，以便得到更加自然的结果。

Abstract (摘要)

In this paper, we address the problem of lowlight enhancement. Our key idea is to first calculate a global illumination estimation for the low-light input, then adjust the illumination under the guidance of the estimation and supplement the details using a concatenation with the original input. Considering that, we propose a GLobal illuminationAware and Detail-preserving Network (GLADNet). The input image is rescaled to a certain size and then put into an encoder-decoder network to generate global priori knowledge of the illumination. Based on the global prior and the original input image, a convolutional network is employed for detail reconstruction. For training GLADNet, we use a synthetic dataset generated from RAW images. Extensive experiments demonstrate the superiority of our method over other compared methods on the real low-light images captured in various conditions.

本文主要解决了低照度增强的问题，关键的思想是输入一张低照度图像进行全局光照估计，然后在估计所得的指导下对亮度进行调整，并于原始图像连接来补充细节。 提出了GladNet，输入图像resize成一定的大小，放入Encoder-Decoder网络中，以生成的光照作为先验基础。将先验结果与原图输入卷积神经网络进行细节重构。

Rethinking BiSeNet For Real-time Semantic Segmentation

2023年12月31日 · 阅读需 17 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, Xiaolin Wei

BiSeNet has been proved to be a popular two-stream network for real-time segmentation. However, its principle of adding an extra path to encode spatial information is time-consuming, and the backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image segmentation due to the deficiency of task-specific design. To handle these problems, we propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, the low-level features and deep features are fused to predict the final segmentation results. Extensive experiments on Cityscapes and CamVid dataset demonstrate the effectiveness of our method by achieving promising trade-off between segmentation accuracy and inference speed. On Cityscapes, we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher resolution images.

在阅读本文前，请先阅读BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation。

该论文提出BiSeNet被证明是不错的双路实时分割网络。不过，在BiSeNet中：

单独为空间信息开辟一条网络路径在计算上非常的耗时
用于spatial path的预训练轻量级骨干网络从其他任务中（例如分类和目标检测）直接拿来，用在分割上效率不很高。

因此，作者提出Short-Term Dense Concatenate network（STDC network）来代替BiSeNet中的context path。其核心内容是移除冗余的结构，进一步加速分割。具体来说，本文将特征图的维数逐渐降低，并将特征图聚合起来进行图像表征，形成了STDC网络的基本模块。同时，在decoder中提出Detail Aggregation module将空间信息的学习以single-stream方式集成到low-level layers中，用于代替BiSeNet中的spatial path。最后，将low-level features和deep features融合以预测最终的分割结果。

注：上图中红色虚线框中的部分是新提出的STDC network；ARM表示注意力优化模块（Attention Refinement Module），FFM表示特征融合模块（Feature Fusion Module）。这两个模块是在BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation就已经存在的设计。

有兴趣请阅读原论文Rethinking BiSeNet For Real-time Semantic Segmentation。

Swin Transformer - Hierarchical Vision Transformer using Shifted Windows

2023年12月31日 · 阅读需 21 分钟

AsTheStarsFall

None

分层Local Vision Transformer，通用主干网络，各类下游任务实现SOTA。Best Paper Award!

论文名称：Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
作者：Ze Liu ，Yutong Lin，Yue Cao，Han Hu，Yixuan Wei，Zheng Zhang，Stephen Lin，Baining Guo
Code：https://github.com/microsoft/Swin-Transformer

介绍

自AlexNet以来，CNN作为骨干（backbone）在计算机视觉中得到了广泛应用；另一方面，自然语言处理中的网络结构的演变则走了一条不同的道路，现在的主流结构是Transformer。

Transformer是为序列建模和转换任务而设计的，它以关注数据中的长期依赖关系而著称。其在NLP领域的巨大成功吸引了人们研究它对CV的适应性，最近的实验显示其在图像分类和联合视觉语言建模方面有所成效。

本文的主要贡献有：

提出了一种分层Transformer，其可以作为计算机视觉的通用主干网络，并且在各类下游任务上取得SOTA；
通过Shift Windows实现了对输入图像尺寸的线性时间复杂度。

DCT-Mask - Discrete Cosine Transform Mask Representation for Instance Segmentation

2023年12月31日 · 阅读需 10 分钟

PuQing

intro * new

论文名称：DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
作者：Xing Shen, Jirui Yang, Chunbo Wei, Bing Deng, Jianqiang Huang, Xiansheng Hua, Xiaoliang Cheng, Kewei Liang
仓库地址：https://github.com/calmevtime/DCTNet

摘要

$Binary\; grid\; mask$ 广泛用于实例分割。就例如 $Mask\ R-CNN$ ¹，如下图所示，网络在 $28\times 28$ 的网格中预测 $Mask$ 。

但是一般来说，低分辨率的网格不足以捕捉细节，而高分辨率会大大增加训练的复杂性，为解决此问题，这篇论文提出一种新的 $Mask$ 表达方式，利用离散余弦变换（ $DCT$ ）将高分辨率的 $Binary\; grid\; mask$ 编码成一个紧凑的向量，这种方法称为 $DCT-Mask$ 。

该方法可以非常容易集成到大多数基于像素的实例分割上。它不需要任何预处理或预训练，而且几乎对速度没有损害。

介绍

就如上图所示， $Mask\ R-CNN$ 将 $GT$ 采样到 $28\times 28$ ，然后上采样重构它，如下图所示，低分辨率的 $Binary\; grid\; mask$ 不足以捕获细节特征，并在上采样过程中产生偏差。

如上图为使用 $DCT$ 和未使用 $DCT$ 方法的比较，左边为 $GT$ ；之后是 $Resize$ 后的 $GT$ ；再是基于 $Resize$ 后的重建图；最后是重建图与原来的 $GT$ 图的误差值。

所以就算预测 $Mask$ 是正确的，重建的 $Mask$ 也有一定的系统误差。解决方式之一是提高 $Binary\; grid\; mask$ 的分辨率，但是实验显示提高分辨率后平均精度（ $AP$ ）比 $28\times 28$ 要差，具体见下图。

How much Position Information Do Convolutional Neural Networks Encode?

2023年12月31日 · 阅读需 19 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

Md Amirul Islam, Sen Jia, Neil D. B. Bruce

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. Information concerning absolute position is inherently useful, and it is reasonable to assume that deep CNNs may implicitly learn to encode this information if there is a means to do so. In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks. A comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

Comments: Accepted to ICLR 2020

引言

经典CNN模型被认为是spatially-agnostic的，因此胶囊网络或循环网络已被用于建模学习特征层内的相对空间关系。目前尚不清楚CNN是否捕获了在位置相关任务中重要的绝对空间信息（例如语义分割和显著对象检测）。如下图所示，被确定为最显著的区域倾向于靠近图像中心。在裁剪过图像上做显著性检测时，即使视觉特征没有改变，最显著的区域也会移动。

在这篇文中，研究了绝对位置的作用通过执行一系列随机化测试，假设CNN确实可以学习到编码位置信息作为决策线索，从而获得位置信息。实验表明，位置信息是通过常用的填充操作（零填充）隐式学习的。

Abstract (摘要)​

介绍​

摘要​

介绍​

引言​

Abstract (摘要)

介绍

摘要

介绍

引言