Assessing the robustness of an artificial intelligence segmentation model for quantitative cardiovascular magnetic resonance imaging across cardiac phenotypes

Autor/innen

  • Hadil Saad
  • Clemens Ammann
  • Thomas Hadler
  • Yashraj Bhoyroo
  • Philine Reisdorf
  • Jana Veit
  • Teodora Chitiboi
  • Jens Wetzl
  • Christian Geppert
  • Jeanette Schulz-Menger

Journal

  • International Journal of Cardiovascular Imaging

Quellenangabe

  • Int J Cardiovasc Imaging 42 (2): 343-357

Zusammenfassung

  • PURPOSE: To introduce an artificial intelligence-based cardiovascular magnetic resonance segmentation algorithm (Nick) for automated quantification of function and parametric mapping across cardiac phenotypes reflecting clinical routine. METHODS: Nick was compared to manual gold standard (GS) segmentations in 359 multi-centre cases at 1.5T and 3T, consisting of 104 healthy individuals and 255 diseased patients with various cardiac phenotypes. Left and right ventricular (LV, RV) volumes and LV mass (LVM) were derived from short-axis segmentations. For parametric mapping, the LV myocardium was segmented to quantify T1 and T2 relaxation times. Statistical analysis comprised mean differences, correlation coefficients (R(2)), Bland-Altman analysis, tolerance range assessments, and paired boxplots. The number of slices and contours requiring manual correction was estimated based on slice-level differences. RESULTS: Nick demonstrated high agreement with the GS for LV and RV volume estimations (R(2)≥0.93) and LVM quantification (R(2)=0.86). For the ejection fractions, correlations were slightly lower (R(2)=0.85/0.72 for LV/RV) with small mean differences (+ 1.14%/-2.48% for LV/RV). T1 and T2 mapping values showed excellent agreement with manual reference values (R²≥0.92) and minimal biases (-1.64/0.14 ms for T1/T2). Nick underestimated LV volumes at end-diastole (-4.48 ml) and end-systole (-3.28 ml) as well as the RV end-diastolic volume (-5.14 ml) and stroke volume (-6.75 ml). Nonetheless, tolerance testing for mean deviations revealed clinically acceptable biases for all comparisons, and less than two slices per case required correction on average. CONCLUSION: Comparison to expert segmentations revealed robust performance of Nick in routine clinical cases with variable pathology, supporting its future integration into clinical workflows.


DOI

doi:10.1007/s10554-025-03596-3