Researchers develop a robust technique for analysing massive patient datasets

July 29, 2021 Off By admin
Shares

The University of Queensland’s immunology and bioinformatics researchers have discovered a powerful technique for analysing massive patient records. Their research could lead to more accurate patient categorization and the faster and more exact implementation of targeted medicines.

The researchers, led by Professor Di Yu of the UQ Diamantina Institute and Dr Yang Yang of The Translational Research Institute, examined four distinct popular technologies to analyse patients’ blood profiles based on gene expression. The approaches were tested on 71 clinical datasets, each of which had over 100 patient samples.

“When you consider that we are looking at massive datasets of patients, each with more than 10,000 genes, we need a really effective way to minimise the complexity of this big data for better interpterion,” Professor Yu explains.

“Of the four tools we evaluated, UMAP stood out as the most powerful. It performed substantially better than PCA, which is now used by many clinicians to try to stratify patients,” he explains.

When it came to reporting patient clustering, UMAP was the most efficient. Using the technology, the researchers were able to distinguish between healthy and lupus samples, as well as divide the lupus patients into illness subgroups. They could also show which patients were improving and which were deteriorating.

The UMAP technique is still in its early stages and is presently only utilised in biomedical research; however, Professor Yu believes that the publishing of his team’s findings in the journal Cell Reports may lead to its eventual clinical application.

“UMAP’s technique is more machine learning-based, which makes it considerably more powerful than the popular PCA tool, which uses a linear approach,” Professor Yu explains.

Reference

Reference
Yang, Y., Sun, H., Zhang, Y., Zhang, T., Gong, J., Wei, Y., … & Yu, D. (2021). Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data. Cellreports.DoI: https://doi.org/10.1016/j.celrep.2021.109442

Shares