发起人
生物
计算生物学
基因
增强子
表观遗传学
遗传学
转录因子
抄写(语言学)
调节顺序
转录调控
基因表达
语言学
哲学
作者
Tanvir Alam,Mohammad Tariqul Islam,Sebastian Schmeier,Mowafa Househ,Dena Al-Thani
标识
DOI:10.1109/bibm47256.2019.8983262
摘要
Promoter regions of long non-coding RNA (lncRNA) genes are crucial to understand their transcriptional regulatory pattern. LncRNA genes, being more cryptic than protein-coding genes in terms of their functionality and biogenesis divergence, are lacking in number of existing studies to elucidate the roles of their promoters compared to their counterparts. Based on the overlap between epigenetic marks and transcription start sites, human lncRNAs were categorized into two broad categories: enhancer-originated lncRNAs (e-lncRNAs) and promoter-originated lncRNAs (p-lncRNAs) and hence these two groups are subject to distinct transcriptional regulatory programs. To understand the difference in the transcriptional regulatory mechanisms that governs p- and e-lncRNAs, we studied the promoter sequences of these two groups of lncRNAs including distinct transcription factor (TF) proteins that favor p-over e-lncRNA (and vice versa). In addition, we developed a convolution neural network (CNN) based deep learning (DL) framework DeePEL (deep p-, e-lncRNA promoter recognizer), to classify the promoter of p- and e-lncRNAs. To the best of our knowledge, this is the first attempt to classify these two groups of lncRNA promoters, using sequence and TF information, based on DL framework. We report several sequence specific signatures in the promoter regions as well as several distinct TFs specific to groups of lncRNAs that will help in understanding the promoter-proximal transcriptional regulation of p-lncRNAs and e-lncRNAs.
科研通智能强力驱动
Strongly Powered by AbleSci AI