现场可编程门阵列
计算机科学
加速
高级合成
管道(软件)
运动估计
数据流图
光流
可重组计算
设计空间探索
计算机硬件
并行计算
计算机工程
嵌入式系统
算法
计算机视觉
数据库
图像(数学)
程序设计语言
作者
Chia-Wei Chang,Zi-Qi Zhong,Jing-Jia Liou
标识
DOI:10.1145/3289602.3294005
摘要
Optical flow algorithm, which estimates the motion detection of consequent video frames, is widely used in surveillance system, Advanced Driver Assistance Systems (ADAS) and object movement estimation in scene analysis. Among different optical flow algorithms, Farneback version provides a better accuracy and brightness-change-resistant displacements by estimating the flow from polynomial domain rather than intensive maps. However, high computation complexity and inconsistent data access patterns make it difficult to be implemented on a hardware platform. In this work, we present a micro-architecture design of Farneback optical flow, which is flexible for optimization with high level Synthesis (HLS) tools. The original software-based implementation was decomposed into functional blocks to balance latency of different stages and flows of data were rearranged to accommodate better memory access patterns. The data flow arrangement is based on a proposed backtrace mechanism, where DRAM accesses of polynomial coefficients in current frame makes consistent traffic patterns, and therefore make it possible to integrate more functional blocks into a deeper pipeline. For several micro-architecture design versions, we demonstrate options of fixed and floating points, optimization techniques such as multiple DMAs and different levels of pipeline integration. We implemented our design on Zedboard Mini-ITX 7045. The results show a 17x end-to-end speedup against a naive HLS version with an image size of 160x120. Considering only the hardware-accelerated part, our FPGA implementation is 40x faster than the naive HLS version with only 50% of the FPGA hardware resources.
科研通智能强力驱动
Strongly Powered by AbleSci AI