An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

重新使用 计算机科学 软件可移植性 软件工程 人工智能 标准化 数据科学 机器学习 软件 依赖关系(UML) 工程类 操作系统 程序设计语言 废物管理
作者
Wenxin Jiang,Nicholas Synovic,Matt Hyatt,Taylor R. Schorlemmer,Rohan Sethi,Yung‐Hsiang Lu,George K. Thiruvathukal,James C. Davis
标识
DOI:10.1109/icse48619.2023.00206
摘要

Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of PTM reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful attributes for model reuse, including provenance, reproducibility, and portability. Three challenges for PTM reuse are missing attributes, discrepancies between claimed and actual performance, and model risks. We substantiate these identified challenges with systematic measurements in the Hugging Face ecosystem. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for model registries.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
852应助火星上的绿海采纳,获得10
刚刚
刚刚
桃子完成签到,获得积分10
刚刚
hyl发布了新的文献求助10
1秒前
NexusExplorer应助白塔采纳,获得20
1秒前
WNL发布了新的文献求助10
1秒前
2秒前
luren发布了新的文献求助10
2秒前
我要那片海完成签到,获得积分20
2秒前
大胆的觅松完成签到,获得积分10
2秒前
3秒前
arniu2008应助细心的沛蓝采纳,获得150
3秒前
JamesPei应助王十七采纳,获得10
5秒前
5秒前
6秒前
8秒前
蛋卷完成签到,获得积分10
8秒前
8秒前
9秒前
9秒前
研友_VZG7GZ应助Felixsun采纳,获得10
9秒前
热心市民范女士完成签到,获得积分10
10秒前
跳跳发布了新的文献求助10
10秒前
deletelzr完成签到,获得积分10
10秒前
11秒前
12秒前
12秒前
12秒前
yuxiaohua发布了新的文献求助10
12秒前
希希研途发布了新的文献求助10
12秒前
核桃发布了新的文献求助10
13秒前
思源应助dg_fisher采纳,获得10
13秒前
13秒前
蟹黄的店发布了新的文献求助10
14秒前
共享精神应助hyl采纳,获得10
14秒前
summer应助爱科研的小白采纳,获得10
14秒前
15秒前
谦让的紫蓝完成签到,获得积分10
16秒前
科目三应助乔木采纳,获得10
16秒前
灯露发布了新的文献求助10
17秒前
高分求助中
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
久松真一著作集〈第5巻〉禅と芸術 500
Fundamentals of Modern Mathematics: A Practical Review (Dover Books on Mathematics) 500
Cold War Transcended: Australia's China Policy, 1949-1990 470
Cybercrime: The Transformation of Crime in the Information Age, 2nd Edition 400
Moore's Clinically Oriented Anatomy 10th Edition 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6619754
求助须知:如何正确求助?哪些是违规求助? 8383702
关于积分的说明 17934722
捐赠科研通 5791188
什么是DOI,文献DOI怎么找? 2960657
邀请新用户注册赠送积分活动 1935864
关于科研通互助平台的介绍 1841564