PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks

计算机科学面子（社会学概念）分割人工智能计算机视觉可控性过程（计算）像素集合（抽象数据类型）应用数学数学社会科学操作系统社会学程序设计语言

作者

Xiaoxiong Du,Jun Ping Peng,Yiyi Zhou,Jinlu Zhang,S. Y. Chen,Guannan Jiang,Xiaoshuai Sun,Rongrong Ji

链接

acm.orgdoi.org

标识

DOI：10.1145/3581783.3612067

摘要

Synthesizing vivid human portraits is a research hot spot in image generation with a wide scope of applications. In addition to fidelity, generation controllability is another key factor that has long plagued its development. To address this issue, existing solutions usually adopt either textual or visual conditions for the target face synthesis, e.g., descriptions or segmentation masks, which still cannot fully control the generation due to the intrinsic shortages of each condition. In this paper, we propose to make use of both types of prior information to facilitate controllable face generation. In particular, we hope to produce coarse-grained information about faces based on the segmentation masks, such as face shapes and poses, and the text description is used to render detailed face attributes, e.g., face color, makeup and gender. More importantly, we hope that the generation can be easily controlled via interactively editing both types of information, making face generation more applicable to real-world applications. To accomplish this target, we propose a novel face generation model termed PixelFace+. In PixelFace+, both the text and mask are encoded as pixel-wise priors, based on which the pixel synthesis process is conducted to produce the expected portraits. Meanwhile, the loss objectives are also carefully designed to make sure that the generated faces are semantically aligned with both text and mask inputs. To validate the proposed PixelFace+, we conducted a comprehensive set of experiments on the widely recognized benchmark called MMCelebA. We not only quantitatively compare PixelFace+ with a bunch of newly proposed Text-to-Face(T2F) generation methods, but also give plenty of qualitative analyses. The experimental results demonstrate that PixelFace+ not only outperforms existing generation methods in both image quality and conditional matching but also shows a much superior controllability of face generation. More importantly, PixelFace+ presents a convenient and interactive way of face generation and manipulation via editing the text and mask inputs. Our SOURCE CODE and DEMO are given in our supplementary materials.

求助该文献

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks

今日热心研友