跳到主要内容

5 篇博文 含有标签「detection」

查看所有标签

· 阅读需 16 分钟
PommesPeter

这是一篇讲低光照人脸检测的论文。原论文(HLA-Face Joint High-Low Adaptation for Low Light Face Detection)

  • 充分利用现有的正常光数据,并探索如何将面部探测器从正常光线调整到低光。这项任务的挑战是,正常和低光之间的差距对于像素级和物体级别来说太大而复杂。因此,大多数现有的lowlighenhance和适应方法不达到所需的performance。
  • 本文是DARK FACE为基准,针对现有的正常照度图像,将图像调整成低照度图像,不需要标签
  • 一个是像素级外观的差距,例如不足,照明,相机噪声和颜色偏置。另一个是正常和低光场景之间的物体级语义差异,包括但不限于路灯的存在,车辆前灯和广告板。传统的低光增强方法[5,6]设计用于提高视觉质量,因此不能填充语义差距,
  • 通过使低光图像亮起并扭曲正常光图像,我们构建位于正常和低光之间的中间状态。

摘要:

Face detection in low light scenarios is challenging but vital to many practical applications, e.g., surveillance video, autonomous driving at night. Most existing face detectors heavily rely on extensive annotations, while col- lecting data is time-consuming and laborious. To reduce the burden of building new datasets for low light condi- tions, we make full use of existing normal light data and explore how to adapt face detectors from normal light to low light. The challenge of this task is that the gap between normal and low light is too huge and complex for both pixel-level and object-level. Therefore, most existing low- light enhancement and adaptation methods do not achieve desirable performance. To address the issue, we propose a joint High-Low Adaptation (HLA) framework. Through a bidirectional low-level adaptation and multi-task high- level adaptation scheme, our HLA-Face outperforms state- of-the-art methods even without using dark face labels for training. Our project is publicly available at: [https: //daooshee.github.io/HLA-Face-Website/](https: //daooshee.github.io/HLA-Face-Website/)

· 阅读需 11 分钟
Gavin Gong

这篇笔记的写作者是VisualDust

原论文Feature Pyramid Networks for Object Detection

这篇论文就是大家熟知的FPN了。FPN是比较早期的一份工作(请注意,这篇论文只是多尺度特征融合的一种方式。不过这篇论文提出的比较早(CVPR2017),在当时看来是非常先进的),在当时具有很多亮点:FPN主要解决的是物体检测中的多尺度问题,通过简单的网络连接改变,在基本不增加原有模型计算量情况下,大幅度提升了小物体检测的性能。

Abstract(摘要)

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using FPN in a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

这篇论文对以后的许多网络设计产生了较大的影响,推荐你阅读原文。这里只是对这篇论文的粗浅阅读笔记。

· 阅读需 17 分钟
PommesPeter

这是一篇讲解一种轻量级主干网络的论文。原论文(MobileNetV2: Inverted Residuals and Linear Bottlenecks)

  • 本文主要针对轻量特征提取网络中结构上的三个修改提高了网络性能。
  • 本文总思路:使用低维度的张量得到足够多的特征

摘要:

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and bench- marks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottle- neck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demon- strate that this improves performance and provide an in- tuition that led to this design. Finally, our approach allows decoupling of the in- put/output domains from the expressiveness of the trans- formation, which provides a convenient framework for further analysis. We measure our performance on ImageNet classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.

· 阅读需 9 分钟
PommesPeter

这是一篇讲解一种轻量级主干网络的论文。原论文(MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications)

  • 本文提出了一种应用于移动或者嵌入式设备的高效神经网络
  • 本文提出了一种操作数较小的卷积模块深度可分离卷积(Depthwise Separable Convolution,以下称DSC)

摘要:

We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

· 阅读需 9 分钟
PommesPeter

论文名称: Low-Light Enhancement Network with Global Awareness

论文作者: Wenjing Wang, Chen Wei, Wenhan Yang, Jiaying Liu

Code: https://github.com/weichen582/GLADNet

这是一篇讲解使用神经网络进行低照度增强的论文。

  • 先对图像的光照进行估计,根据估计的结果来调整原图像
  • 调整过程中会对图像中的细节重构,以便得到更加自然的结果。

Abstract (摘要)

In this paper, we address the problem of lowlight enhancement. Our key idea is to first calculate a global illumination estimation for the low-light input, then adjust the illumination under the guidance of the estimation and supplement the details using a concatenation with the original input. Considering that, we propose a GLobal illuminationAware and Detail-preserving Network (GLADNet). The input image is rescaled to a certain size and then put into an encoder-decoder network to generate global priori knowledge of the illumination. Based on the global prior and the original input image, a convolutional network is employed for detail reconstruction. For training GLADNet, we use a synthetic dataset generated from RAW images. Extensive experiments demonstrate the superiority of our method over other compared methods on the real low-light images captured in various conditions.

本文主要解决了低照度增强的问题,关键的思想是输入一张低照度图像进行全局光照估计,然后在估计所得的指导下对亮度进行调整,并于原始图像连接来补充细节。 提出了GladNet,输入图像resize成一定的大小,放入Encoder-Decoder网络中,以生成的光照作为先验基础。将先验结果与原图输入卷积神经网络进行细节重构。