嵌入
相似性(几何)
计算机科学
对偶(语法数字)
集合(抽象数据类型)
聚合物
语言模型
玻璃化转变
人工智能
数据集
理论计算机科学
还原(数学)
支持向量机
算法
尺寸缩减
向量空间
机器学习
统计物理学
生物系统
代表(政治)
方案(数学)
降维
自然语言处理
材料科学
作者
Aymar Tchagoue,Véronique Églin,Jean-Marc Petit,Sébastien Pruvost,Jannick Duchet‐Rumeau,Jean‐François Gérard
标识
DOI:10.1021/acs.jcim.5c02469
摘要
Recent years have witnessed major advances in polymer informatics, yet accurately predicting polymer properties, such as the glass transition temperature (Tg), remains a challenge. Language models like BERT have been leveraged to derive embeddings from polymer representations (e.g., SMILES). However, similarity between embedding vectors in these latent spaces primarily reflects chemical structural similarity, with limited alignment to physicochemical properties. Here, we introduce a dual-embedding framework that enhances Tg prediction by combining a conventional BERT-based embedding with a fine-tuned counterpart explicitly trained so that vector similarity reflects proximity in Tg values. We evaluate our approach across four benchmarks: a heterogeneous data set compared against 25 machine learning baselines, along with three additional data sets focused on homopolymers and polyimides. The dual embedding outperforms standard BERT-based embeddings, achieving up to a 20% reduction in RMSE and surpassing alternative models such as graph-based and descriptor-based approaches. These results demonstrate that embedding molecular properties directly into representations can advance polymer informatics beyond structure-centric paradigms.
科研通智能强力驱动
Strongly Powered by AbleSci AI