计算机科学
极限学习机
硬件加速
延迟(音频)
现场可编程门阵列
嵌入式系统
加速度
人工智能
低延迟(资本市场)
吞吐量
过程(计算)
门阵列
ARM体系结构
人工神经网络
无线
操作系统
电信
物理
经典力学
计算机网络
作者
Amin Safaei,Q. M. Jonathan Wu,Thangarajah Akilan,Yimin Yang
标识
DOI:10.1109/tcad.2018.2878162
摘要
Machine learning algorithms such as those for object classification in images, video content analysis, and human action recognition are used to extract meaningful information from data recorded by image sensors and cameras. Among the existing machine learning algorithms for such purposes, extreme learning machines (ELMs) and online sequential ELMs (OS-ELMs) are well known for their computational efficiency and performance when processing large datasets. The latter approach was derived from the ELM approach and optimized for real-time application. However, OS-ELM classifiers are computationally demanding, and the existing state-of-the-art computing platforms are not efficient enough for embedded systems, especially for applications with strict requirements in terms of low power consumption, high throughput, and low latency. This paper presents the implementation of an ELM/OS-ELM in a customized system-on-a-chip field-programmable gate array-based architecture to ensure efficient hardware acceleration. The acceleration process comprises parallel extraction, deep pipelining, and efficient shared memory communication.
科研通智能强力驱动
Strongly Powered by AbleSci AI