Transcriptomics-Past, Present and Future

July 5, 2021 Off By admin
Shares

Each cell has its own biological processes, one of the most significant of which is transcription, which controls gene expression (coding as well as non-coding genes). A transcript is one of the outcomes of this procedure (i.e., mRNA, rRNA, tRNA and miRNA). The transcriptome is the total content of transcript (all RNA) molecules in a cell or population of cells at a given time, similar to the genome. By definition, all transcripts (RNAs) are part of the transcriptome, but this phrase might vary depending on the experiment, with others simply considering mRNA as the transcriptome’s content. Transcripts are the result of the transcription process, and the transcriptome refers to the cell’s entire transcript content. Unlike the genome, which is static, the transcriptome is dynamic, altering from cell to cell and even within a cell under different conditions.Because the transcriptome is a mirror of the sequence of the DNA (gene) from which it was transcribed, researchers can determine when and where each individual gene is turned on or off in a specific cell or tissue of an organism by analysing the entire collection of RNA sequences in a cell (transcriptome).

transcriptome

  Transcriptomics is now a well-established subject with a variety of approaches for studying the transcriptome. cDNA microarray and NGS-based RNA-Seq are the most reliable and high-throughput transcriptomics methodologies. The high throughput nature of these approaches enables them to evaluate genome-wide gene expression, which can then be used to conduct functional genomics research.

PAST

Numerous decades prior to the advent of transcriptomics, studies of individual transcripts were conducted. In the late 1970s, libraries of silkmoth mRNAs were assembled and transformed to complementary DNA (cDNA) for storage using reverse transcriptase. In the 1980s, low-throughput Sanger sequencing was utilised to sequence random transcripts extracted from these libraries, referred to as expressed sequence tags (ESTs). Until the development of high-throughput technologies such as DNA sequencing by synthesis (Solexa/Illumina, San Diego, CA), the Sanger method was the dominant method of sequencing. ESTs gained fame in the 1990s as a cost-effective way of determining an organism’s gene content without sequencing the full genome. Individual transcript quantification using northern blotting, nylon membrane arrays, and later reverse transcriptase quantitative PCR (RT-qPCR) was also popular. However, these methods are time consuming and capture only a small subset of a transcriptome. As a result, until high-throughput approaches were discovered, the mechanism by which a transcriptome as a whole is expressed and regulated remained unknown.

The term “transcriptome” appeared for the first time in the 1990s. Serial analysis of gene expression (SAGE), one of the earliest sequencing-based transcriptomic approaches, was created in 1995. It employed Sanger sequencing of concatenated random transcript fragments. By comparing fragments to known genes, transcripts were measured. A variation of SAGE was also briefly used that utilised high-throughput sequencing techniques, dubbed digital gene expression analysis. These methods, however, were mainly supplanted by high-throughput sequencing of complete transcripts, which revealed additional information about the structure of the transcript, such as splice variants.

 Evolution of DNA Sequencing Tools

PRESENT

Transcriptomics has been defined by the introduction of new approaches that have redefined what is feasible approximately every decade or so and rendered prior technologies outdated. In 1991, the first attempt at capturing a partial human transcriptome was published, resulting in the identification of 609 mRNA sequences from the human brain. Two human transcriptomes were published in 2008, each including millions of transcript-derived sequences spanning 16,000 genes, and by 2015, hundreds of individuals’ transcriptomes have been published. Transcriptomes of various disease states, tissues, and even individual cells are frequently created. Transcriptomics has exploded in popularity as a result of the rapid development of new technologies with increased sensitivity and affordability.

MICROARRAYS

Microarrays and RNA-Seq, the two leading modern approaches, were created in the mid-1990s and 2000s. Microarrays were initially published in 1995 to quantify the abundance of a specific set of transcripts by their hybridisation to an array of complementary probes. Microarray technology enabled the simultaneous analysis of thousands of transcripts at a significantly lower cost per gene and labour savings. Until the late 2000s, both spotted oligonucleotide arrays and Affymetrix (Santa Clara, California) high-density arrays were the gold standard for transcriptional profiling. During this time span, a variety of microarrays covering known genes in model or commercially significant species were developed. Advances in the design and fabrication of arrays increased probe specificity and enabled the testing of more genes on a single array. Fluorescence detection advances increased the sensitivity and accuracy of measurements for low abundance transcripts.

RNA-SEQ

RNA-Seq is a term that refers to the sequencing of transcript cDNAs, the abundance of which is calculated using the number of counts for each transcript. As a result, the advent of high-throughput sequencing technology has had a significant impact on the approach. MPSS was an early example, relying on a complex series of hybridisations to generate 16–20 bp sequences, and was used in 2004 to validate the expression of 104 genes in Arabidopsis thaliana. The first RNA-Seq work was reported in 2006, utilising the 454 technique to sequence 105 transcripts. This provided adequate coverage to calculate the relative abundance of transcripts. After 2008, when new Solexa/Illumina technologies (San Diego, CA) enabled the recording of 109 transcript sequences, RNA-Seq gained popularity. This yield is now adequate for quantitative analysis of complete human transcriptomes.

RNA-seq is based on next-generation sequencing concepts which have the potential to provide qualitative as well as quantitative analysis of RNA in a given biological sample. RNA-seq has emerged in the last decade as a powerful method for transcriptome analyses that will eventually make microarrays obsolete for gene expression analysis.

In addition to NGS, there is third-generation sequencing, which allows for long-read sequencing of individual RNA molecules. Single-molecule RNA sequencing enables the generation of full-length cDNA transcripts without clonal amplification or transcript assembly. Thus, third-generation sequencing is free from the shortcomings generated by PCR amplification and read mapping. It can greatly reduce the false positive rate of splice sites and capture the diversity of transcript isoforms . Single-molecule sequencing platforms comprise Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing, Helicos single-molecule fluorescent sequencing and Oxford Nanopore Technologies (ONT) nanopore sequencing.

Bulk RNA Sequencing

Early efforts to understand gene expression focused on analyzing the total mRNA of a tissue sample, also known as “bulk RNA sequencing.” Using this technique, a tissue sample is homogenized (i.e., blended up) and its total mRNA is analyzed to show the average gene expression levels across the entire sample. However, this does not provide details about what the transcriptome looks like at the single-cell level. In many diseases, such as cancer, the genetic profile of individual cells in a tumor is different. Without data for each cell, researchers can not understand the full genetic profile of a tumor or other disease types. Drop-Seq and InDrop were initially reported in 2015 by analyzing mouse retina cell and embryonic stem cell transcriptomes, identifying novel cell types. Sci-RNA-seq, single-cell combinatorial indexing RNA sequencing, was developed in 2017, and SPLiT-seq (split-pool ligation-based transcriptome sequencing) was first reported in 2018. Both approaches use a combinatorial indexing strategy in which attached RNAs are labeled with barcodes that indicate their cellular origin

Single-cell RNA sequencing

The next-generation technology in this field, single-cell RNA sequencing (scRNA-seq), enabled researchers to understand the transcriptome of a single cell in a dissociated tissue—a tissue broken apart into its cellular components. This was a pioneering approach, but it did not provide spatial information, which is important for three primary reasons. 

First, the way one cell transcribes its genes (copies DNA to RNA) affects the way it signals to its neighbor. As this process repeats, each cell’s signal creates a chain of information in a tissue. Second, complex tissues, such as the brain, liver, kidney, and other major organs are not homologous. There are differences between the genes transcribed by the various groups of cells in these tissues. These differences create the distinctions in cellular function needed to maintain a working organ. Third, there are some diseases for which RNA is localized differently than it is in healthy cells.

The ability to see where and when genes express could provide a crucial understanding of why and how diseases arise. It could also create a stronger foundation for tissue engineering, from skin grafts to synthetic hearts and kidneys. 

FUTURE & CHALLENGES

Spatialomics is considered by many to be the next frontier in understanding gene expression. Earlier this year, Nature named spatially resolved transcriptomics—also known as spatialomics—the 2020 Method of the Year.This burgeoning new technology is the combination of tissue imaging and single-cell transcriptomic analysis, the full expression profile of messenger RNA (mRNA). The technique creates an entirely new type of scientific information. Spatialomics information can be used to more deeply understand disease, potentially enabling the development of novel treatments. 

Spatial transcriptomics is a groundbreaking molecular profiling method that allows scientists to measure all the gene activity in a tissue sample and map where the activity is occurring.Spatial Transcriptomics utilizes spotted arrays of specialized mRNA-capturing probes on the surface of glass slides. Each spot contains capture probes with a spatial barcode unique to that spot. When tissue is attached to the slide, the capture probes bind RNA from the adjacent point in the tissue.Spatially resolved transcriptomics was originally invented in 2016 by Lundeberg, Frisen, and Stahl at KTH, Sweden. This technology provides gene expression data for large numbers of cells, while simultaneously adding another dimension to the data – positional information.

Spatialomics tools are what could be considered the third generation of transcriptome analysis technologies. The transcriptome is the total messenger RNA (mRNA) in a sample, whether at a whole tissue or single-cell level. This reveals all of the genes being expressed at the moment of analysis.

Understanding the transcriptome is important because while the human genome has 3 billion base pairs, the majority of genes these base pairs encode for are either not expressed or are only expressed under certain conditions. This information can inform scientists how the genes expressed by, say, a cancer cell differ from a normal cell. Understanding these differences can garner new insight into diseases and pave the way for novel treatment.

Spatialomics builds on the advances of two early gene expression research technologies, bulk RNA sequencing and single-cell RNA sequencing. Both of these methods illustrate just how far spatialomics has come:

Spatial single-cell transcriptomics resolves the drawbacks of older bulk RNA and scRNA-seq technologies.  By combining imaging and single-cell RNA sequencing, researchers can map where particular transcripts are expressed within a tissue. This can not only reveal the “where” of gene expression but also indicate the context of how individual cells function within that tissue.1 

Spatial transcriptomics enables the resolution we need to make connections and refine our understanding of how cells interact within a tissue.Earlier methods give us a picture of tissue heterogeneity, but achieving cellular and sub-cellular resolution is needed not just for a better understanding of biology, but also for a better understanding of disease.

Spatial transcriptomics will be applied in multiple fields. At present, spatial transcriptomics is also entering the field of synthetic biology. This technology can provide new ideas for other engineering issues such as agriculture and biofuels.The future of spatial transcriptomics is multiomics. This technology combines imaging, transcriptomics, proteomics, and other types of analysis. Multi-omics can link gene expression, protein expression, chromatin state, epigenomics and metabolomics data, etc. Instead of focusing only on gene expression, it can create highly complex images.

Spatial reconstruction of tissues has seen varied applications, from neuroscience and developmental biology to tumor heterogeneity in oncology. A huge challenge though is managing the vast amounts of data produced with each experiment. The high dimensionality of the data also slows down the process of data analysis considerably. Working through these challenges in the years to follow could lead to the development of scalable techniques that harness the potential of integrating imaging and transcriptomics methods.

Slide-seq, reported in 2019, uses DNA barcode beads with specific positional information. Geo-seq was introduced in 2017 and integrated scRNA-seq with laser capture microdissection (LCM), which can isolate individual cells. In situ sequencing refers to targeted sequencing of RNA fragments in morphologically preserved tissues or cells without RNA extraction, including in situ cDNA synthesis by padlock probes or stably cross-linked cDNA amplicons in fluorescent in situ RNA sequencing (FISSEQ) and in situ amplification by rolling-circle amplification (RCA). Furthermore, various new technologies based on RNA-seq have been developed for specific applications. For example, a type of targeted RNA sequencing, CaptureSeq, employs biotinylated oligonucleotide probes and results in the enrichment of certain transcripts to identify gene fusion.

References

  1. https://www.nature.com/articles/s41592-020-01033-y
  2. https://www.nature.com/articles/s41598-020-75708-z
Shares