计算机科学
文字2vec
自然语言处理
人工智能
变压器
判决
嵌入
水准点(测量)
语言模型
语义相似性
深度学习
情报检索
大地测量学
电压
物理
地理
量子力学
作者
Wen‐Qiang Cao,Qing Li,Siying Zhang,Rixin Xu,Youqi Li
标识
DOI:10.1109/cbd63341.2023.00040
摘要
In recent years, semantic embeddings for text has played a bigger role in the field of natural language processing (NLP), additionally, it has shown great potential in real-life applications like search and recommendation systems. Therefore, models for generating semantic text embeddings have received extensive study. State-of-the-art solutions for text embeddings have evolved from traditional methods (like Word2Vec, Glove, etc.) to deep neural network based solutions (such as LSTM, Transformer, and pre-trained models like BERT and RoBERTa, etc), besides, frameworks like Sentence Transformer have already lowered the bar of training models for semantic text representation using customized models and datasets. In this paper, we investigated several well trained models according to Massive Text Embedding Benchmark (MTEB) in Huggingface website. Enlighted by the extensive use of prompt engineering in large language models like Llama or GPT3, we proposed STEP: a novel method using prompt to improve performance of text embeddings on downstream tasks, making it applicable to almost any pre-trained language models for text embeddings. Besides, STEP does not need to modify base model structure. In the experiment, we applied STEP to five pre-trained models chosen from MTEB, trained and evaluated our approach on two separated datasets, final results indicated that our approach could improve performance of tasks related to semantic text similarity.
科研通智能强力驱动
Strongly Powered by AbleSci AI