cancer-machinelearning-omicstutorials

Using machine-learning to find mutations in similar genome sequences of cancer samples

July 21, 2021 Off By admin
Shares

The Francis Crick Institute’s research team has developed a strategy for detecting alterations in similar genomic regions in cancer samples. In a paper published in the journal Nature Biotechnology, the researchers describes how they utilised a machine-learning algorithm to identify malignant mutations in non-unique regions of the genome.

As part of the evolutionary process, sections of the human genome have undergone rearrangement and, in some cases, duplication. This form of duplication has been shown to be problematic when looking for mutations. Current scanning methods reject short sequences that are judged ambiguous, which means that portions of the genome that are quite similar to one another are omitted from such reports—and therefore any changes are missed. In this newest endeavour, the researchers developed a method for detecting mutations in non-unique sections of the genome.

The technique started with the creation of a list of known comparable genomic areas, which were then used to train a machine-learning system to recognise them. Following that, the approach was utilised to detect mutations in a range of tissues, including 2,658 samples from the Pan-Cancer Analysis of Whole Genome dataset. The scientists found 1,744 mutations in coding sequences and many more in non-coding sequences. They also discovered that their algorithm had a false discovery rate of about 7% and a validation rate of more than 80%.

The researchers discovered that coding sequence alterations had an effect on protein sequences, some of which have been linked to specific types of cancer. They also discovered instances of mutations that resulted in protein alterations, which have been related to specific types of cancer. As one example, they detected a recurrent mutation in the KMT2C and PIK3CA genes. In addition, they detected mutations linked to breast cancer. They also detected changes in regulatory regions, including several in the immunoglobulin family.

The researchers believe that their method can be used by other groups to avoid the difficulties associated with overlooking changes in substantially comparable genomic locations.

Reference:
Maxime Tarabichi et al, A pan-cancer landscape of somatic mutations in non-unique regions of the human genome, Nature Biotechnology (2021). DOI: 10.1038/s41587-021-00971-y

Shares