联营
膨胀(度量空间)
计算机科学
边界(拓扑)
背景(考古学)
动作(物理)
特征(语言学)
人工智能
数学
地理
几何学
哲学
数学分析
考古
物理
量子力学
语言学
作者
Dianlong You,Houlin Wang,Bingxin Liu,Yu Yang,Zhiming Li
标识
DOI:10.1109/icassp49357.2023.10096729
摘要
Temporal Action Detection(TAD) is a challenge task in video understanding. The current methods mainly use global features for boundary matching or predefine all possible proposals, while ignoring long context information and local action boundary features, resulting in the decline of detection accuracy. To fill this gap, we propose a Dilation Location Network (DL-Net) model to generate more precise action boundaries by enhancing boundary features of actions and aggregating long contextual information in this paper. Specifically, we design the boundary feature enhancement (BFE) block, which strengthens the actions boundary feature and fuses the similar feature of the different channels by pooling and channel squeezing. Meanwhile, in action location, we design multiple dilated convolutional structures to aggregate long contextual information of time point/interval. We conduct extensive experiments on ActivityNet-1.3 and Thumos14 show that DL-Net is capable of enhancing action boundary features and aggregating long contextual information effectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI