计算机科学
稳健性(进化)
人工智能
桥接(联网)
嵌入
扩散
相似性(几何)
余弦相似度
模式识别(心理学)
图像(数学)
计算机网络
生物化学
热力学
基因
物理
化学
标识
DOI:10.1145/3653946.3653952
摘要
Stable Diffusion has recently emerged as a groundbreaking technique in the field of generative artificial intelligence. By utilizing concise text prompts, Stable Diffusion is capable of producing images that are either strikingly realistic or visually stunning, bridging the gap between textual descriptions and visual representations. A notable challenge associated with Stable Diffusion is the retrieval of corresponding prompts from generated images. To tackle this challenge, we employed a strategy that leverages the capabilities of the ConvNeXt model within the OpenCLIP framework. Having trained the model on an extensive dataset of nearly 2.1 million Stable Diffusion generated images, we then subjected our approach to the evaluation of the Kaggle Stable Diffusion - Image to Prompts competition. The effectiveness of our methodology is underscored by a mean cosine similarity score of 0.63032 between the predicted and actual prompt embedding vectors on the private test set, securing a silver medal position. This achievement not only attests to the robustness of our approach but also signifies its potential in diverse sectors where the retrieval of text from images is of great importance.
科研通智能强力驱动
Strongly Powered by AbleSci AI