跳到主要内容

2 篇博文 含有标签「frequency-domain」

查看所有标签

· 阅读需 21 分钟
Gavin Gong

Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren

Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same inpu t size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

Comments: Accepted to CVPR 2020

· 阅读需 10 分钟
PuQing

论文名称:DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

作者:Xing Shen, Jirui Yang, Chunbo Wei, Bing Deng, Jianqiang Huang, Xiansheng Hua, Xiaoliang Cheng, Kewei Liang

仓库地址:https://github.com/calmevtime/DCTNet

摘要

Binary  grid  maskBinary\; grid\; mask 广泛用于实例分割。就例如 Mask RCNNMask\ R-CNN1,如下图所示,网络在 28×2828\times 28 的网格中预测 MaskMask

但是一般来说,低分辨率的网格不足以捕捉细节,而高分辨率会大大增加训练的复杂性,为解决此问题,这篇论文提出一种新的 MaskMask 表达方式,利用离散余弦变换(DCTDCT)将高分辨率的Binary  grid  maskBinary\; grid\; mask编码成一个紧凑的向量,这种方法称为 DCTMaskDCT-Mask

该方法可以非常容易集成到大多数基于像素的实例分割上。它不需要任何预处理或预训练,而且几乎对速度没有损害。

介绍

就如上图所示,Mask RCNNMask\ R-CNNGTGT 采样到 28×2828\times 28 ,然后上采样重构它,如下图所示,低分辨率的 Binary  grid  maskBinary\; grid\; mask 不足以捕获细节特征,并在上采样过程中产生偏差。

如上图为使用 DCTDCT 和未使用 DCTDCT 方法的比较,左边为 GTGT ;之后是 ResizeResize 后的 GTGT ;再是基于 ResizeResize 后的重建图;最后是重建图与原来的GTGT图的误差值。

所以就算预测 MaskMask 是正确的,重建的 MaskMask 也有一定的系统误差。解决方式之一是提高 Binary  grid  maskBinary\; grid\; mask 的分辨率,但是实验显示提高分辨率后平均精度(APAP)比 28×2828\times 28 要差,具体见下图。