Negativeome characterization and decontamination in early-life virome studies.
Contaminant sequences of external origin complicate the study of host-associated viromes, particularly in low-biomass samples obtained through viral-like particle (VLP) enrichment. However, the prevalence and impact of external contaminants on low-biomass samples are under-studied. Here, we analyze 1321 gut virome samples and 55 negative controls (NCs) from four early-life virome studies. Virus sequences identified in NCs, termed negativeome, were used as a proxy for the contamination assessment. We show that 61% of samples share at least one identical strain with negativeome, likely representing external contamination. While the median abundance of contaminant strains in these samples is only 1%, it ranges from 0 to 99% and exceeds 10% in 11% of infant samples. We further demonstrate that contamination is largely study-specific and has a greater impact on infant samples than on maternal samples. Based on our results, we propose a contamination assessment method using a publicly available database of sequences detected in NCs and a strain-level decontamination strategy.