计算机科学
人工智能
推论
场景图
关系(数据库)
图形
分类器(UML)
成对比较
分割
机器学习
模式识别(心理学)
数据挖掘
理论计算机科学
渲染(计算机图形)
作者
Nanhao Liang,Yong Liu,Wenfang Sun,Yingwei Xia,Fan Wang
标识
DOI:10.1109/icassp48485.2024.10446810
摘要
Panoptic Scene Graph (PSG) generation aims to generate a scene graph representing pairwise relationship between objects within an image. Its use of pixel-wise segmentation mask and inclusion of background regions in relationship inference make it quickly become a popular approach. However, it has an intrinsic challenge that the trained relationship predictors are either of low value or of low quality due to the long-tail distribution of typical datasets. Inspired by how humans use prior knowledge to greatly simplify this problem, we bring in two novel designs, using a pre-trained vision-language model to correct the data skewness, and using conditional prior distribution on contexts to further refine the prediction quality. Specifically, the approach named CKT-RCM first exploits relation-associated visual features from the image encoder and constructs a relation classifier by extracting text embeddings for all relationships from the text encoder of the vision-language model. It also utilizes rich relational context from subject-object pairs to facilitate informative relation predictions via a cross-attention mechanism. We conduct comprehensive experiments on the OpenPSG dataset and achieve state-of-the-art performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI