Towards efficient deep learning in computer vision via network sparsity and distillation

修剪 人工智能 深度学习 机器学习 背景(考古学) 计算机科学 推论 人工神经网络 多样性(控制论) 蒸馏 地理 化学 考古 有机化学 农学 生物
作者
Huan Wang
标识
DOI:10.17760/d20659759
摘要

Artificial intelligence (AI) empowered by deep learning, has been profoundly transforming the world. However, the excessive size of these models remains a central obstacle that limits their broader utility. Modern neural networks commonly consist of millions of parameters, with foundation models extending to billions. The rapid expansion in model size introduces many challenges including training cost, sluggish inference speed, excessive energy consumption, and negative environmental implications such as increased CO2 emissions. Addressing these challenges necessitates the adoption of efficient deep learning (EDL). The dissertation focuses on two overarching approaches, network sparsity (a.k.a. pruning) and knowledge distillation, to enhance the efficiency of deep learning models in the context of computer vision. Network pruning focuses on eliminating redundant parameters in a model while preserving the performance. Knowledge distillation aims to enhance the performance of the target model, referred to as the "student", by leveraging guidance from a stronger model, known as the "teacher". This approach leads to performance improvements in the target model without reducing its size. In this dissertation, I will start with the background and motivation for more efficient deep learning models in the past several years in the context of the arising foundation models. Then, the basic concepts, goals, and challenges of EDL will be introduced along with the major sub-methods. After that, the major part of this dissertation will be dedicated to elaborating on the proposed efficiency algorithms based on pruning and distillation in a variety of applications. For the pruning part, the dissertation first presents an effective pruning algorithm GReg [27] in image classification, by tapping into a growing regularization strategy. Then, in order to understand the real progress of network pruning, a fairness principle is introduced to fairly compare different pruning methods [32]. The investigation leads us to the central role of network trainability in pruning, which has been largely overlooked by prior works. A trainability-preserving pruning approach, TPP [28], is then proposed to show the merits of maintaining trainability during pruning. A short survey [33] on an emerging new pruning paradigm, pruning at initialization, is then presented to discuss its potential and the connections with the conventional pruning after training. The GReg algorithm is further extended to a low-level vision task, single image super-resolution (SR), to explore the difference of utilizing pruning in low-level vision (SR) vs. high-level vision (image classification). Three efficient SR approaches (ASSL [29], GASSL [30], SRP [34]) are introduced. For the distillation part, the dissertation first focuses on the interaction between knowledge distillation and data augmentation in image classification [35], a proved proposition presented to rigorously understand what defines the "goodness" of a data augmentation scheme in distillation. Next, the dissertation showcases how to employ distillation to significantly improve the inference efficiency for novel view synthesis in 3D vision. Both static scenes [31] and dynamic scenes [36] are considered. Finally, SnapFusion [37] is presented to demonstrate a systematic efficiency optimization of deep models by jointly utilizing pruning and distillation, towards an unprecedentedly fast speed of text-to-image generation based on diffusion models. Finally, a comprehensive summary along with takeaways and outlooks of the future work will conclude the dissertation. Major takeaways include (1) there is no panacea towards efficient deep learning for all tasks; solution is usually case-by-case; (2) there is a clear trend that the efficiency solution for future models (especially the large models) will feature a systematical optimization and co-design in many axes (e.g., hardware, system, and algorithm); (3) profiling is always a good start to understand the problem so as to build the right efficiency portfolio.--Author's abstract
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
困困泡芙完成签到,获得积分10
3秒前
小蘑菇应助兮何采纳,获得20
3秒前
4秒前
4秒前
5秒前
赢学发布了新的文献求助30
5秒前
6秒前
8秒前
rendong4009发布了新的文献求助10
8秒前
9秒前
9秒前
可爱的函函应助收入股采纳,获得10
11秒前
万能图书馆应助ohhhh采纳,获得10
12秒前
13秒前
堃kun发布了新的文献求助10
14秒前
15秒前
15秒前
生动路人发布了新的文献求助10
16秒前
卓梨发布了新的文献求助10
17秒前
18秒前
闻风听雨发布了新的文献求助10
20秒前
赢学完成签到,获得积分10
21秒前
科研通AI5应助柠檬酸钠采纳,获得10
22秒前
辉太狼发布了新的文献求助10
23秒前
收入股完成签到,获得积分10
24秒前
柯凌发布了新的文献求助10
25秒前
25秒前
果子完成签到 ,获得积分10
25秒前
27秒前
27秒前
28秒前
王大橘完成签到 ,获得积分10
29秒前
端庄的妙菱完成签到,获得积分10
29秒前
33秒前
34秒前
34秒前
柚子完成签到 ,获得积分10
35秒前
谢耳朵发布了新的文献求助10
36秒前
yangxt-iga发布了新的文献求助10
37秒前
锅里有虾发布了新的文献求助10
37秒前
高分求助中
【重要!!请各位用户详细阅读此贴】科研通的精品贴汇总(请勿应助) 10000
Plutonium Handbook 1000
Three plays : drama 1000
International Code of Nomenclature for algae, fungi, and plants (Madrid Code) (Regnum Vegetabile) 1000
Semantics for Latin: An Introduction 999
Ultra-Wide Bandgap Semiconductor Materials 600
Psychology Applied to Teaching 14th Edition 600
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 冶金 细胞生物学 免疫学
热门帖子
关注 科研通微信公众号,转发送积分 4090233
求助须知:如何正确求助?哪些是违规求助? 3628861
关于积分的说明 11505095
捐赠科研通 3341046
什么是DOI,文献DOI怎么找? 1836577
邀请新用户注册赠送积分活动 904535
科研通“疑难数据库(出版商)”最低求助积分说明 822367