April, 2026

Improving machine-learning development in allergology: bridging the gap between open-access and cohort-based databases

View in MDC Repository

View in PubMed

Authors

Aleix Arnau-Soler
Jeremy Corriger
Yannick Chantran
Julien Goret

Journal

Current Opinion in Allergy and Clinical Immunology

Citation

Curr Opin Allergy Clin Immunol 26 (2): 138-148

Abstract

PURPOSE OF REVIEW: The advent of high-throughput data generation and artificial intelligence has transformed allergy research. Open-access database (OAD) and cohort-based database (CBD) provide essential resources for machine learning (ML)-driven algorithms for risk stratification and decision support. It is crucial for allergologists to understand their construction, strengths, and limitations. We review recently published databases with a focus on how these datasets can be combined to enhance research. RECENT FINDINGS: OAD, including environmental monitoring resources, omics repositories, and electronic health records, offer scale, diversity, and opportunities for new hypotheses, but are often limited by sparse clinical annotation, heterogeneous data generation, and incomplete linkage to patient-level outcomes. CBD provide wellphenotyped patients, longitudinal follow up, and high-quality clinical and immunological measurements, yet face constraints in sample size, population diversity, and data sharing. Studies integrating OAD breadth with CBD label fidelity report improved predictive performance when paired with disciplined evaluation. Federated learning and portable feature specifications are emerging to enable privacy-preserving collaborations. SUMMARY: Allergologists play a central role in building ML-ready resources. By ensuring rigorous clinical annotation, standardization of data, transparent methods, and independent validation, they can maximize the utility of OAD and CBD and their combination to accelerate progress toward precision allergy medicine.

DOI

doi:10.1097/aci.0000000000001143

View in MDC Repository

View in PubMed