6 篇博文含有标签「light-weight」

查看所有标签

MobileNetV2 - Inverted Residuals and Linear Bottlenecks

2023年12月31日 · 阅读需 17 分钟

PommesPeter

I want to be strong. But it seems so hard.

这是一篇讲解一种轻量级主干网络的论文。原论文（MobileNetV2: Inverted Residuals and Linear Bottlenecks）。

本文主要针对轻量特征提取网络中结构上的三个修改提高了网络性能。
本文总思路：使用低维度的张量得到足够多的特征

摘要:

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and bench- marks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottle- neck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demon- strate that this improves performance and provide an in- tuition that led to this design. Finally, our approach allows decoupling of the in- put/output domains from the expressiveness of the trans- formation, which provides a convenient framework for further analysis. We measure our performance on ImageNet classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

MobileNets - Efficient Convolutional Neural Networks for Mobile Vision Applications

2023年12月31日 · 阅读需 9 分钟

PommesPeter

I want to be strong. But it seems so hard.

这是一篇讲解一种轻量级主干网络的论文。原论文（MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications）。

本文提出了一种应用于移动或者嵌入式设备的高效神经网络
本文提出了一种操作数较小的卷积模块深度可分离卷积(Depthwise Separable Convolution，以下称DSC)

摘要:

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

BiSeNet - Bilateral Segmentation Network for Real-time Semantic Segmentation

2023年12月31日 · 阅读需 10 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

BiSeNet的目标是更快速的实时语义分割。在语义分割任务中，空间分辨率和感受野很难两全，尤其是在实时语义分割的情况下，现有方法通常是利用小的输入图像或者轻量主干模型实现加速。但是小图像相较于原图像缺失了很多空间信息，而轻量级模型则由于裁剪通道而损害了空间信息。BiSegNet整合了Spatial Path (SP) 和 Context Path (CP)分别用来解决空间信息缺失和感受野缩小的问题。

Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048x1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

论文原文：BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation。阅读后你会发现，这篇论文有很多思路受到SENet（Squeeze-and-Excitation Networks）的启发。

Rethinking BiSeNet For Real-time Semantic Segmentation

2023年12月31日 · 阅读需 17 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, Xiaolin Wei

BiSeNet has been proved to be a popular two-stream network for real-time segmentation. However, its principle of adding an extra path to encode spatial information is time-consuming, and the backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image segmentation due to the deficiency of task-specific design. To handle these problems, we propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, the low-level features and deep features are fused to predict the final segmentation results. Extensive experiments on Cityscapes and CamVid dataset demonstrate the effectiveness of our method by achieving promising trade-off between segmentation accuracy and inference speed. On Cityscapes, we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher resolution images.

在阅读本文前，请先阅读BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation。

该论文提出BiSeNet被证明是不错的双路实时分割网络。不过，在BiSeNet中：

单独为空间信息开辟一条网络路径在计算上非常的耗时
用于spatial path的预训练轻量级骨干网络从其他任务中（例如分类和目标检测）直接拿来，用在分割上效率不很高。

因此，作者提出Short-Term Dense Concatenate network（STDC network）来代替BiSeNet中的context path。其核心内容是移除冗余的结构，进一步加速分割。具体来说，本文将特征图的维数逐渐降低，并将特征图聚合起来进行图像表征，形成了STDC网络的基本模块。同时，在decoder中提出Detail Aggregation module将空间信息的学习以single-stream方式集成到low-level layers中，用于代替BiSeNet中的spatial path。最后，将low-level features和deep features融合以预测最终的分割结果。

注：上图中红色虚线框中的部分是新提出的STDC network；ARM表示注意力优化模块（Attention Refinement Module），FFM表示特征融合模块（Feature Fusion Module）。这两个模块是在BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation就已经存在的设计。

有兴趣请阅读原论文Rethinking BiSeNet For Real-time Semantic Segmentation。

You Only Look One-level Feature

2023年12月31日 · 阅读需 17 分钟

Gavin Gong

Rubbish CVer | Poor LaTex speaker | Half stack developer | 键圈躺尸砖家

Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, Jian Sun

This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion. From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {\em utilizing only one-level feature for detection}. Based on the simple and efficient solution, we present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated Encoder and Uniform Matching, are proposed and bring considerable improvements. Extensive experiments on the COCO benchmark prove the effectiveness of the proposed model. Our YOLOF achieves comparable results with its feature pyramids counterpart RetinaNet while being 2.5× faster. Without transformer layers, YOLOF can match the performance of DETR in a single-level feature manner with 7× less training epochs. With an image size of 608×608, YOLOF achieves 44.3 mAP running at 60 fps on 2080Ti, which is 13% faster than YOLOv4. Code is available at this https URL.

本文简称YOLOF。截至到本文写作时，二阶段和单阶段目标检测的SOTA方法中广泛使用了多尺度特征融合的方法。FPN方法几乎已经称为了网络中理所应当的一个组件。

本文中作者重新回顾了FPN模块，并指出FPN的两个优势分别是其分治（divide-and-conquer）的解决方案、以及多尺度特征融合。本文在单阶段目标检测器上研究了FPN的这两个优势，并在RetinaNet上进行了实验，将上述两个优势解耦，分别研究其发挥的作用，并指出，FPN在多尺度特征融合上发挥的作用可能没有想象中那么大。

最后，作者提出YOLOF，这是一个不使用FPN的目标检测网络。其主要创新是：

Dilated Encoder
Uniform Matching

该网络在达到RetinaNet对等精度的情况下速度提升了2.5倍。

PP-LCNet - A Lightweight CPU Convolutional Neural Network

2023年12月31日 · 阅读需 4 分钟

AsTheStarsFall

None

轻量级Trick的优化组合。

论文名称：PP-LCNet: A Lightweight CPU Convolutional Neural Network
作者：Cheng Cui, Tingquan Gao, Shengyu Wei,Yuning Du...
Code：https://github.com/PaddlePaddle/PaddleClas

摘要

总结了一些在延迟（latency）几乎不变的情况下精度提高的技术；
提出了一种基于MKLDNN加速策略的轻量级CPU网络，即PP-LCNet。

介绍

目前的轻量级网络在启用MKLDNN的Intel CPU上速度并不理想，考虑了一下三个基本问题：

如何促使网络学习到更强的特征，但不增加延迟？
在CPU上提高轻量级模型精度的要素是什么？
如何有效地结合不同的策略来设计CPU上的轻量级模型？

摘要​

介绍​

摘要

介绍