2 篇博文含有标签「frequency-domain」 | 工具箱的深度学习记事簿

Learning in the Frequency Domain

2023年12月31日 · 阅读需 21 分钟

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren

Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same inpu t size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

Comments: Accepted to CVPR 2020

DCT-Mask - Discrete Cosine Transform Mask Representation for Instance Segmentation

2023年12月31日 · 阅读需 10 分钟

PuQing

intro * new

论文名称：DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
作者：Xing Shen, Jirui Yang, Chunbo Wei, Bing Deng, Jianqiang Huang, Xiansheng Hua, Xiaoliang Cheng, Kewei Liang
仓库地址：https://github.com/calmevtime/DCTNet

摘要

$Binary\; grid\; mask$ 广泛用于实例分割。就例如 $Mask\ R-CNN$ ¹，如下图所示，网络在 $28\times 28$ 的网格中预测 $Mask$ 。

但是一般来说，低分辨率的网格不足以捕捉细节，而高分辨率会大大增加训练的复杂性，为解决此问题，这篇论文提出一种新的 $Mask$ 表达方式，利用离散余弦变换（ $DCT$ ）将高分辨率的 $Binary\; grid\; mask$ 编码成一个紧凑的向量，这种方法称为 $DCT-Mask$ 。

该方法可以非常容易集成到大多数基于像素的实例分割上。它不需要任何预处理或预训练，而且几乎对速度没有损害。

介绍

就如上图所示， $Mask\ R-CNN$ 将 $GT$ 采样到 $28\times 28$ ，然后上采样重构它，如下图所示，低分辨率的 $Binary\; grid\; mask$ 不足以捕获细节特征，并在上采样过程中产生偏差。

如上图为使用 $DCT$ 和未使用 $DCT$ 方法的比较，左边为 $GT$ ；之后是 $Resize$ 后的 $GT$ ；再是基于 $Resize$ 后的重建图；最后是重建图与原来的 $GT$ 图的误差值。

所以就算预测 $Mask$ 是正确的，重建的 $Mask$ 也有一定的系统误差。解决方式之一是提高 $Binary\; grid\; mask$ 的分辨率，但是实验显示提高分辨率后平均精度（ $AP$ ）比 $28\times 28$ 要差，具体见下图。

摘要​

介绍​

摘要

介绍