跳到主要内容

2 篇博文 含有标签「dynamic-neural-network」

查看所有标签

· 阅读需 33 分钟
Gavin Gong

这篇论文是一篇对动态神经网络的综述,原论文"Dynamic Neural Networks: A Survey"主要讲了:

  • 概念(Introduction)
  • 常见的动态神经网络
    • Instance-wise Dynamic Networks
    • Spatial-wise Dynamic Networks
    • Temporal-wise Dynamic Network
  • 推理和训练(Inference and Training)
  • 常见应用和代表性工作(Applications)

这篇论文对近些年吸引了很多研究者的动态神经网络进行了较为系统的总结概括。

Abstract(摘要)

Dynamic neural network is an emerging research topic in deep learning. Compared to static models which have fixed computational graphs and parameters at the inference stage, dynamic networks can adapt their structures or parameters to different inputs, leading to notable advantages in terms of accuracy, computational efficiency, adaptiveness, etc. In this survey, we comprehensively review this rapidly developing area by dividing dynamic networks into three main categories: 1) instance-wise dynamic models that process each instance with data-dependent architectures or parameters; 2) spatial-wise dynamic networks that conduct adaptive computation with respect to different spatial locations of image data and 3) temporal-wise dynamic models that perform adaptive inference along the temporal dimension for sequential data such as videos and texts. The important research problems of dynamic networks, e.g., architecture design, decision making scheme, optimization technique and applications, are reviewed systematically. Finally, we discuss the open problems in this field together with interesting future research directions.

动态神经网络近些年的相关研究逐渐变多,比起固定计算图的传统的静态神经网络,动态神经网络能够可以根据输入的具体数据调整它们的结构或是参数,同时在速度和精度方面占有优势。一种比喻是:“在输入较为简单时,动态神经网络可以很快;在输入较为复杂时,动态神经网络可以精度很高”。

这篇论文概括地介绍了动态神经网络是如何“动态”的,以及动态带来了怎样的优势。

我看了这篇综述受益匪浅,如果有时间的话请阅读原作。本文只是对原作阅读的粗浅笔记。

· 阅读需 12 分钟

从稀疏连接性、权重共享、动态权重进一步探究Local Attention。

论文名称:Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight

作者:Qi Han1,Zejia Fan,Qi Dai,Lei Sun,Ming-Ming Cheng,Jiaying Liu,Jingdong Wang

Code:https://github.com/Atten4Vis/DemystifyLocalViT/

介绍

本文的主要成果发现(finding)如下:

  1. Local Transformer采用的Local Attention利用了现有的正则化方案(regularization schemes)、稀疏连接(sparse connectivity )、权重共享(weight sharing)以及动态权重预测(dynamic weight prediction),在不需要额外增加模型复杂度和训练数据的情况下增加性能;

  2. 局部注意力(Local Attention)与(动态)深度卷积((dynamic )depth-wise convolution)在稀疏连接性上相似,在权重共享和动态权重预测上不同。

    实验结果表明,局部注意力和(动态)深度卷积所采用的正则化形式和动态权重预测方案具有相似的性能。

  3. 此外,提出了一个关系图来联系卷积和注意力,同时开发了基于MLP的方法。

    关系图表明,这些方法本质上利用了不同的稀疏连接和权重共享模式,可以选择使用动态权重预测进行模型正则化。