适配器(计算)
计算机科学
编码器
人工智能
计算机视觉
忠诚
模式识别(心理学)
计算机硬件
电信
操作系统
作者
Xing Peng,Ning Wang,Jianbo Ouyang,Zechao Li
标识
DOI:10.1109/tpami.2025.3590321
摘要
The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation. However, existing personalization methods cannot simultaneously satisfy high-fidelity and low-costs requirements. Their main bottleneck lies in the additional prompt image encoder (i.e., CLIP vision encoder), which produces weak alignment signals with the text-to-image model that may lose face information and is not well 'absorbed' by the text-to-image model. Towards this end, we propose Inv-Adapter, which first introduces a more reasonable and efficient token representation of ID image features and introduces a lightweight parameter adaptor to inject ID features. Specifically, our Inv-Adapter extracts diffusion-domain representations of ID images utilizing a pre-trained text-to-image model via DDIM image inversion, without an additional image encoder. Benefiting from the high alignment of the extracted ID prompt features and the intermediate features of the text-to-image model, we then introduce a lightweight attention adapter to embed them efficiently into the base text-to-image model. We conduct extensive experiments on different text-to-image models to assess ID fidelity, generation loyalty, speed, training costs, model scale and generalization ability in scenarios of general object, all of which show that the proposed Inv-Adapter is highly competitive in ID customization generation and model scale.
科研通智能强力驱动
Strongly Powered by AbleSci AI