作者
Ali Askari,Sumedha Kota,Hailey Ferrell,Shriya Swamy,Kayla S. Goodman,Christine C. Okoro,Isaiah C. Spruell Crenshaw,Daniela K. Hernandez,Taylor E. Oliphant,Akshata A. Badrayani,Andrew D. Ellington,Gwendolyn M. Stovall
摘要
The deposited dataset is a snapshot of the data in the active and growing UTexas Aptamer Database, https://sites.utexas.edu/aptamerdatabase/. This dataset is a collection of aptamer data that has been extracted from the literature every year since the inception of aptamer selections and includes multiple aptamer sequences from a given paper (as opposed to just sequences with the tightest binding). In all, the collection includes 1,415 aptamer sequences from 489 papers published over the last few decades (1990-2022). Since our dataset includes multiple sequences that emerged from a given selection experiment, it of necessity includes sequences that may not have been individually tested for binding activity, similar to the inclusion of all rRNA sequences in a metagenomic analysis of an environmental sample. By taking this metagenomic approach, we provide informaticians with a much wider range of sequences for subsequent analysis while still providing tools to find high-affinity aptamers for future use. For each aptamer sequence, the dataset includes information about the aptamer publication (i.e., year of publication, DOI, full citation, and corresponding author(s)), the aptamer target, as well as the following information about the specific aptamer: nucleic acid composition, name assigned in the original publication, sequence, GC percentage, sequence length, binding affinity (Kd), binding/selection buffer, application as quoted in the referenced paper (e.g., drug delivery, biosensor, etc.), original nucleic acid pool used in the aptamer selection, post-selection modifications (if any), additional information, and our internally assigned serial number. We used simple Excel formulas for each aptamer record to calculate the GC content and length of each aptamer sequence. 1.1.0 Version: Added 25+ aptamer sequences. Added the "Parent sequence serial number" data field/column. Fixed "Application as quoted in the referenced paper" data formatting/alignment error.