Mitigation and detection of putative microbial contaminant reads from long-read metagenomic datasets
Authors
- Stefany Ayala-Montaño
- Ayorinde O. Afolayan
- Raisa Kociurzynski
- Ulrike Löber
- Sandra Reuter
Journal
- Microbial Genomics
Citation
- Microb Genom 12 (1): 1609
Abstract
Metagenomic sequencing of clinical samples has significantly enhanced our understanding of microbial communities. However, microbial contamination and host-derived DNA remain a major obstacle to accurate data interpretation. Here, we present a methodology called 'Stop-Check-Go' for detecting and mitigating contaminants in metagenomic datasets obtained from neonatal patient samples (nasal and rectal swabs). This method incorporates laboratory and bioinformatics work combining a prevalence method, coverage estimation and microbiological reports. We compared the 'Stop-Check-Go' decontamination system with other published decontamination tools and commonly found poor performance in decontaminating microbiologically negative patients (false positives). We emphasize that host DNA decreased by an average of 76% per sample using a lysis method and was further reduced during post-sequencing analysis. Microbial species were classified as putative contaminants and assigned to 'Stop' in nearly 60% of the dataset. The 'Stop-Check-Go' system was developed to address the specific need of decontaminating low-biomass samples, where existing tools primarily designed for short-read metagenomic data showed limited performance.