计算机科学
一致性(知识库)
人工智能
水准点(测量)
嵌入
生成语法
质量(理念)
图像质量
图像(数学)
基线(sea)
特征(语言学)
感知
编码(集合论)
机器学习
生成模型
模式识别(心理学)
质量得分
数据挖掘
特征提取
质量评定
源代码
计算机视觉
深度学习
上下文图像分类
作者
Wen Sun,Chaofeng Chen,Liang Liao,W. N. Lin
标识
DOI:10.1109/tmm.2026.3668530
摘要
With the advancement of generative AI, generative models such as Generative Adversarial Networks (GANs) and diffusion-based models have significantly enhanced the capabilities in generating photorealistic images. Despite significant progress in AI-Generated Images (AIGIs), the evaluation of AIGI quality remains underexplored in multiple aspects, including AI artifacts, unnatural content, and the availability of input text prompts. However, existing methods are insufficient to capture both feature representations and correlations between image and text for AIGI quality assessment in aspects of perceptual image quality and text-to-image alignment. To address this, we proposed a novel Prompt-Image-Caption Consistency (PICC) framework, designed to adapt a pre-trained vision-language model for AIGI quality assessment by considering both perceptual image quality and text-to-image alignment at the same time. This framework effectively explores the prompt-image-caption triplet by adapting image embedding and multimodal embedding via Quality-Aware Attention to capture quality-aware features and calculating consistency scores to build the correlations among the prompt, image, and caption. Additionally, we propose a multilevel strategy to integrate both local and global information from multiple prompt-image-caption triplets, further enhancing prediction performance. Extensive experiments on benchmark datasets, including AIGIQA-20K and AGIQA-3K, demonstrate that the proposed PICC achieves state-of-the-art performance compared to baseline methods. The code will be made publicly available.
科研通智能强力驱动
Strongly Powered by AbleSci AI