作者
Eve Shalom,Hyung Min Kim,Rianne A. van der Heijden,Zaki Ahmed,Reyna Patel,David A. Hormuth,Julie DiCarlo,Thomas E. Yankeelov,Nicholas J. Sisco,Richard D. Dortch,Ashley M. Stokes,Marianna Inglese,Matthew Grech‐Sollars,Nicola Toschi,Prativa Sahoo,Anup Singh,Sanjay Kumar Verma,Divya Rathore,Anum S. Kazerouni,Savannah C. Partridge,Eve LoCastro,Ramesh Paudyal,Ivan A. Wolansky,Amita Shukla‐Dave,Pepijn Schouten,Oliver J. Gurney‐Champion,Radovan Jiřík,Ondřej Macíček,Michal Bartoš,Jana Vitous,Ayesha Bharadwaj Das,Sungheon Kim,Louisa Bokacheva,Artem Mikheev,Henry Rusinek,Michael Berks,Penny L. Hubbard Cristinacce,Ross A. Little,Susan Cheung,James P.B. O’Connor,Geoffrey Parker,Brendan Moloney,Peter S. LaViolette,Samuel Bobholz,Savannah Duenweg,John Virostko,Hendrik Laue,Kyunghyun Sung,Ali Nabavizadeh,Hamidreza Saligheh Rad,Leland Hu,Steven Sourbron,Laura C. Bell,Anahita Fathi Kazerooni
摘要
Abstract Purpose has often been proposed as a quantitative imaging biomarker for diagnosis, prognosis, and treatment response assessment for various tumors. None of the many software tools for quantification are standardized. The ISMRM Open Science Initiative for Perfusion Imaging–Dynamic Contrast‐Enhanced (OSIPI‐DCE) challenge was designed to benchmark methods to better help the efforts to standardize measurement. Methods A framework was created to evaluate values produced by DCE‐MRI analysis pipelines to enable benchmarking. The perfusion MRI community was invited to apply their pipelines for quantification in glioblastoma from clinical and synthetic patients. Submissions were required to include the entrants' values, the applied software, and a standard operating procedure. These were evaluated using the proposed score defined with accuracy, repeatability, and reproducibility components. Results Across the 10 received submissions, the score ranged from 28% to 78% with a 59% median. The accuracy, repeatability, and reproducibility scores ranged from 0.54 to 0.92, 0.64 to 0.86, and 0.65 to 1.00, respectively (0–1 = lowest–highest). Manual arterial input function selection markedly affected the reproducibility and showed greater variability in analysis than automated methods. Furthermore, provision of a detailed standard operating procedure was critical for higher reproducibility. Conclusions This study reports results from the OSIPI‐DCE challenge and highlights the high inter‐software variability within estimation, providing a framework for ongoing benchmarking against the scores presented. Through this challenge, the participating teams were ranked based on the performance of their software tools in the particular setting of this challenge. In a real‐world clinical setting, many of these tools may perform differently with different benchmarking methodology.