推论
计算机科学
软件部署
软件可移植性
深度学习
边缘计算
灵活性(工程)
边缘设备
人工智能
分布式计算
吞吐量
嵌入式系统
计算机体系结构
机器学习
计算机工程
GSM演进的增强数据速率
软件工程
云计算
无线
操作系统
统计
数学
作者
Ishrak Jahan Ratul,Yuxiao Zhou,Kecheng Yang
出处
期刊:Electronics
[MDPI AG]
日期:2025-07-25
卷期号:14 (15): 2977-2977
被引量:4
标识
DOI:10.3390/electronics14152977
摘要
Deep learning (DL) continues to play a pivotal role in a wide range of intelligent systems, including autonomous machines, smart surveillance, industrial automation, and portable healthcare technologies. These applications often demand low-latency inference and efficient resource utilization, especially when deployed on embedded or edge devices with limited computational capacity. As DL models become increasingly complex, selecting the right inference framework is essential to meeting performance and deployment goals. In this work, we conduct a comprehensive comparison of five widely adopted inference frameworks: PyTorch, ONNX Runtime, TensorRT, Apache TVM, and JAX. All experiments are performed on the NVIDIA Jetson AGX Orin platform, a high-performance computing solution tailored for edge artificial intelligence workloads. The evaluation considers several key performance metrics, including inference accuracy, inference time, throughput, memory usage, and power consumption. Each framework is tested using a wide range of convolutional and transformer models and analyzed in terms of deployment complexity, runtime efficiency, and hardware utilization. Our results show that certain frameworks offer superior inference speed and throughput, while others provide advantages in flexibility, portability, or ease of integration. We also observe meaningful differences in how each framework manages system memory and power under various load conditions. This study offers practical insights into the trade-offs associated with deploying DL inference on resource-constrained hardware.
科研通智能强力驱动
Strongly Powered by AbleSci AI