Catherine Baranowski,Héctor García Martín,Diego A. Oyarzún,Hansen Spinner,B. Desai,Christopher J. Petzold,Evangelos-Marios Nikolados,S.H.W. Kraatz,Aljaž Gaber,Robert J. Chalkley,Devin R. Scannell,Rachel Sevey,Michael C. Jewett,Peter J. Kelly,Erika A. DeBenedictis
Recombinant protein expression is central to biotechnology's application. However, not all proteins can be expressed in all organisms, and, given the vast experimental space, it can be challenging to identify the conditions that will yield successful protein expression. The field lacks a predictive model of soluble protein expression that could replace laborious experimental trial and error. Here, we discuss the state of the field and identify the lack of large, high-fidelity datasets as the primary bottleneck to progress. We outline a proposed path toward an extensible experimental platform for collecting soluble overexpression data across organisms. We suggest that the resulting data should be used to train predictive models of protein expression toward answering the question: can protein expression be solved?