Improving machine-learning development in allergology: bridging the gap between open-access and cohort-based databases
Authors
- Aleix Arnau-Soler
- Jeremy Corriger
- Yannick Chantran
- Julien Goret
Journal
- Current Opinion in Allergy and Clinical Immunology
Citation
- Curr Opin Allergy Clin Immunol 26 (2): 138-148
Abstract
PURPOSE OF REVIEW: The advent of high-throughput data generation and artificial intelligence has transformed allergy research. Open-access database (OAD) and cohort-based database (CBD) provide essential resources for machine learning (ML)-driven algorithms for risk stratification and decision support. It is crucial for allergologists to understand their construction, strengths, and limitations. We review recently published databases with a focus on how these datasets can be combined to enhance research. RECENT FINDINGS: OAD, including environmental monitoring resources, omics repositories, and electronic health records, offer scale, diversity, and opportunities for new hypotheses, but are often limited by sparse clinical annotation, heterogeneous data generation, and incomplete linkage to patient-level outcomes. CBD provide wellphenotyped patients, longitudinal follow up, and high-quality clinical and immunological measurements, yet face constraints in sample size, population diversity, and data sharing. Studies integrating OAD breadth with CBD label fidelity report improved predictive performance when paired with disciplined evaluation. Federated learning and portable feature specifications are emerging to enable privacy-preserving collaborations. SUMMARY: Allergologists play a central role in building ML-ready resources. By ensuring rigorous clinical annotation, standardization of data, transparent methods, and independent validation, they can maximize the utility of OAD and CBD and their combination to accelerate progress toward precision allergy medicine.