计算机科学
情报检索
嵌入
图像检索
特征学习
特征向量
加密
情态动词
架空(工程)
特征提取
数据挖掘
人工智能
图像(数学)
化学
高分子化学
操作系统
作者
Mingyue Li,Yuting Zhu,Ruizhong Du,Chunfu Jia
标识
DOI:10.1109/jiot.2025.3526939
摘要
As a pivotal link between visual and linguistic relationships, image-text cross-modal retrieval has received widespread attention. However, existing studies primarily focus on intricate machine learning models to enhance retrieval accuracy and overlook the critical aspect of privacy preservation for images and texts, rendering them unsuitable for lightweight IoT environments. To tackle these challenges, we propose LPCR-IoT, a lightweight and privacy-preserving cross-modal retrieval scheme tailored to IoT environments. LPCR-IoT employs knowledge distillation to train lightweight student models for extracting feature vectors from images and texts, subsequently embedding them into a unified semantic space. Significantly, we propose a new training metric (i.e., Intra-modal Consistent Contrast Loss), which improves the retrieval accuracy by increasing the semantic consistency of the image and text in the common embedding space. Additionally, a novel quadtree index structure leveraging hybrid representation vectors is designed to effectively mitigate retrieval overhead, where feature vectors of images and texts alongside representation vectors are encrypted using a secure kNN algorithm based on LWE, enabling image-text matching in a large-scale ciphertext environment. Finally, we provide a detailed formal analysis to evaluate the security of LPCR-IoT and validate its practicality through extensive experiments on three real-world datasets, namely COCO, Flickr30k and NUS-WIDE.
科研通智能强力驱动
Strongly Powered by AbleSci AI