Fixing the bioinformatics bottleneck to follow the COVID-19 pandemic faster- A suggestion from researchers.

July 20, 2021 Off By admin
Shares

A global consortium of researchers is advocating for increased integration of viral genetics, bioinformatics, and public health in order to enable more effective pandemic response today and in the future.

An international collaboration of experts in viral and genetic analysis, led by Swiss scientists Dr. Emma Hodcroft of the University of Bern and Prof. Christophe Dessimoz of the University of Lausanne, both of the SIB Swiss Institute of Bioinformatics, and Dr. Nick Goldman of the EMBL-EBI in the United Kingdom, laid out the ‘bioinformatics bottlenecks’ that are hampered in a comment piece published in the journal. The following are critical takeaways and viewpoints from a Swiss perspective.

The threat of vaccine efficacy being compromised by rapidly spreading SARS-CoV-2 mutations has sparked a global rush to strengthen coronavirus genomic surveillance. This is critical for detecting and tracking emerging strains promptly. Additionally, it can provide a more definitive picture of how transmission occurs between individuals than conventional contact tracing can. As of the time of publication of this article, laboratories worldwide had sequenced more than 610,000 SARS-CoV-2 samples; this figure might easily approach one million by the end of the pandemic. In theory, these genomes could aid in our understanding of how viruses travel within communities and across the globe, potentially allowing us to halt infections. In practise, such studies uncover far fewer details than they might.

The vast bulk of research on these genetic sequences is not carried out by public-health organisations. It is driven by academic researchers, many of whom are in their early careers, who build software and analytical tools on their own time to acquire vital answers. Nextstrain, a Swiss and American open-source project, is assisting in the coordination of these endeavours. However, when new data comes in, it is getting increasingly difficult to maintain the phylogenetic trees up to date. Previously, Nextstrain was used to track influenza and Ebola outbreaks, but only retrospectively or with tiny updates every week or month — not to track thousands of sequences each day during the peak of a worldwide epidemic. Researchers must now update their analyses on a daily basis.

Phylogenetic analyses have previously been undertaken independently of wet laboratories. For instance, Datamonkey is a collection of modelling and bioinformatics tools developed by Temple University researchers in Philadelphia, Pennsylvania. They developed a web programme that looks everyday for natural selection indicators in SARS-CoV-2 phylogenies (http://covid19.datamonkey.org). Similarly, Lucy van Dorp, a computational biologist at University College London, and her colleagues examine genomic databases in search of variants associated with increased viral spread. On the other hand, computational studies are rarely empirically validated. Meanwhile, phylogenies are rarely associated with laboratory-based studies examining these distinct variants and their responses to vaccination in cell cultures. We need to develop methods for combining data from wet biology and sequence analysis in order to provide a more full picture of mutation development and distribution.

Combining epidemiological and phylogenetic data revealed that a lockdown in New Zealand lowered Re from 7 at the outbreak’s commencement to 0.2 by the end of March 2020 in one cluster of cases.Despite their effectiveness, these techniques are rarely used. They are difficult to make and necessitate a high level of skill that is in short supply. Expanding phylogenetic epidemiology would benefit from increased training and more user-friendly software. To retrace the virus’s route, we can employ complex algorithms that incorporate phylogenetic uncertainty, transmission models, and patient and sequencing data. These approaches, however, are currently much too computationally demanding to be applied to each sample collected.

Due to the haste to disseminate data and the (required) entry of less experienced labs into sequencing, data can contain subtle but hazardous errors — both in the sequences themselves and in the location and timing ‘metadata’ associated with them. Errors in the sequences themselves can be considerably more subtle and consequential. Contamination, low-quality samples, and errors introduced throughout the processing pipeline can result in the introduction of erroneous mutations or even their removal. These inaccuracies subsequently spread to downstream analysis, redrawing linkages that may confuse outbreak investigations or result in the virus’s biology being blamed for changes. Individuals are attempting to isolate the most problematic aspects of the data. However, what scientists truly require are stable and open infrastructures that enable the entire community to update sequences and information throughout the pandemic.

The proposed approaches for accounting for these biases are inefficient when applied to big data sets. Worse, many analyses presume that the viral population is steady — although it is not. In an ideal world, computational methods would be impervious to sampling bias and would be used in conjunction with databases that enable scientists to track why each sample was sequenced. This could aid in determining the growth and geographic origin of novel variations, as well as assisting public health officials in overcoming sample challenges. Researchers and public health authorities must collaborate to simplify the use of technologies and to train others. Governments should finance secondments, allowing researchers to take a break from their academic duties during public health emergencies. With sufficient funding to develop and deploy the necessary technologies, phylogenetics researchers can promptly detect developing SARS-CoV-2 variations and reconstruct the transmission history of an outbreak. We encourage academics, funders, and public-health institutions to establish the resources, incentives, and requirements necessary to foster phylogeny and public-health cooperation for the benefit of all.

Reference:
Hodcroft, E. B., et al. (2021) Want to track pandemic variants faster? Fix the bioinformatics bottleneck. Nature. doi.org/10.1038/d41586-021-00525

Shares