计算机科学
有状态防火墙
节点(物理)
GSM演进的增强数据速率
容器(类型理论)
边缘计算
边缘设备
分布式计算
分析
推论
安全性令牌
星团(航天器)
计算机网络
聚类分析
数据挖掘
人工智能
操作系统
工程类
云计算
交通工程
机械工程
结构工程
作者
Suchanat Mangkhangcharoen,Jason Haga,Prapaporn Rattanatamrong
标识
DOI:10.1109/hpcc-dss-smartcity-dependsys53884.2021.00299
摘要
Many current IoT applications deployed at the edge use deep learning (DL) in their real-time processing and analytics. Not only inference but also training is moving to edge devices. DL application and dataset migration among these devices are mandatory for scenarios like node failure, user mobility or when nodes need to collaborate (e.g., distributed training). Container technologies and Kubernetes (K8s) are being increasingly adopted to manage infrastructure at the edge. Unfortunately, there is no built-in mechanism in K8s to support migration of stateful containers between its cluster nodes. The K8s cluster's master node generally launches a new fresh container in another node to replace the failed one. While there is an existing mechanism for migrating a Pod between K8s nodes, there is no past work investigating the migration of DL datasets and containerized DL applications among K8s cluster nodes. In this paper, we present our 1) comprehensive study on the effectiveness and limitations of existing checkpointing mechanisms for containerized DL applications and 2) our comparative performance study of several approaches in migrating DL datasets and applications in a K8s cluster. Our results show that migrating states of DL applications and restoring them from their previous states enables faster recovery (reducing training time by 10 to 73 percent) than re-running these models from the beginning regardless of the percentage of epochs that have completed. Additionally, our experimental results show that transferring a dataset between K8s workers using the K8s persistent volume with kubectl cp is generally suitable and efficient. However, when network latency is high, using our customized middleware with a feedback controller to migrate data in parallel can speed up total migration time compared to the K8s's persistent volume approach alone.
科研通智能强力驱动
Strongly Powered by AbleSci AI