节拍(声学)
基因
简单(哲学)
基础(证据)
生物
计算生物学
遗传学
物理
地理
考古
声学
哲学
认识论
出处
期刊:CERN European Organization for Nuclear Research - Zenodo
日期:2023-10-21
标识
DOI:10.5281/zenodo.10030425
摘要
These are the pulled NCBI (and UniProt, when applicable) summaries of genes, as well as the corresponding OpenAI text embeddings (text-embedding-ada-002 and text-embedding-3-large) computed on the summaries. See methods details in Chen and Zou (2024+). The unzipped folder contains four different files: NCBI_summary_of_genes.json (NCBI gene card summary of human genes) NCBI_UniProt_summary_of_genes.json (NCBI gene card and UniProt protein (when applicable) summary of human genes) GenePT_gene_embedding_ada_text.pickle (a dictionary of numpy array where gene names (upper case) are keys and text-embedding-ada-002 embeddings of the summary in 1. are the values) GenePT_gene_protein_embedding_model_3_text.pickle (a dictionary of numpy array where gene names (upper case) are keys and text-embedding-3-large embeddings of the summary in 1. are the values) Reference: Chen YT, Zou J. (2024+) GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv preprint: https://www.biorxiv.org/content/10.1101/2023.10.16.562533v1.
科研通智能强力驱动
Strongly Powered by AbleSci AI