特征工程
可解释性
计算机科学
人工智能
分类器(UML)
机器学习
特征(语言学)
基线(sea)
源代码
支持向量机
逻辑回归
决策树
深度学习
哲学
语言学
海洋学
地质学
操作系统
作者
Aditya Kashyap,Delip Rao,Mary Regina Boland,Li Shen,Chris Callison-Burch
标识
DOI:10.1093/bioinformatics/btaf156
摘要
Abstract Motivation The integration of Machine Learning (ML) and Artificial Intelligence (AI) into healthcare has immense potential due to the rapidly growing volume of clinical data. However, existing AI models, particularly Large Language Models (LLMs) like GPT-4, face significant challenges in terms of explainability and reliability, particularly in high-stakes domains like healthcare. Results This paper proposes a novel LLM-aided feature engineering approach that enhances interpretability by extracting clinically relevant features from the Oxford Textbook of Medicine. By converting clinical notes into concept vector representations and employing a linear classifier, our method achieved an accuracy of 0.72, outperforming a traditional n-gram Logistic Regression baseline (0.64) and the GPT-4 baseline (0.48), while focusing on high level clinical features. We also explore using Text Embeddings to reduce the overall time and cost of our approach by 97%. Availability All code relevant to this paper is available at: https://github.com/AdityaKashyap423/Dementia_LLM_Feature_Engineering/tree/main Supplementary Information Supplementary PDF and other data files can be found at https://drive.google.com/drive/folders/1UqdpsKFnvGjUJgp58k3RYcJ8zN8zPmWR?usp=share_link .
科研通智能强力驱动
Strongly Powered by AbleSci AI