Multiscale footprints reveal the organization of cis-regulatory elements
计算生物学
生物
监管科学
化学
生态学
作者
Hu Yan,Max A. Horlbeck,Ruochi Zhang,Sai Ma,Rojesh Shrestha,Vinay K. Kartha,Fabiana M. Duarte,Conrad Hock,Rachel E. Savage,Ajay Labade,Heidi Kletzien,Alia Meliki,Andrew Castillo,Neva C. Durand,Eugenio Mattei,Lauren J. Anderson,Tristan Tay,Andrew Earl,Noam Shoresh,Charles B. Epstein
Cis-regulatory elements (CREs) control gene expression and are dynamic in their structure and function, reflecting changes in the composition of diverse effector proteins over time1. However, methods for measuring the organization of effector proteins at CREs across the genome are limited, hampering efforts to connect CRE structure to their function in cell fate and disease. Here we developed PRINT, a computational method that identifies footprints of DNA-protein interactions from bulk and single-cell chromatin accessibility data across multiple scales of protein size. Using these multiscale footprints, we created the seq2PRINT framework, which uses deep learning to allow precise inference of transcription factor and nucleosome binding and interprets regulatory logic at CREs. Applying seq2PRINT to single-cell chromatin accessibility data from human bone marrow, we observe sequential establishment and widening of CREs centred on pioneer factors across haematopoiesis. We further discover age-associated alterations in the structure of CREs in murine haematopoietic stem cells, including widespread reduction of nucleosome footprints and gain of de novo identified Ets composite motifs. Collectively, we establish a method for obtaining rich insights into DNA-binding protein dynamics from chromatin accessibility data, and reveal the architecture of regulatory elements across differentiation and ageing.