Current high throughput scRNA-seq technologies, including spatial transcriptomics, rely on 3’ short-read sequencing to identify and quantify the captured mRNA transcripts, usually from the last 300 (non-coding) nucleotides. While greatly efficient, it prevents the identification of potentially functionally important isoforms further on in the coding sequence, resulting from alternative splicing pattern. But a newly published method – Spatial Isoform Transcriptomics aka SiT – uses full-length sequencing to detect and spatially maps isoforms transcriptome-wide.
Spatial Isoform Transcriptomics aims for wide isoform coverage
Developed at the Institute of Molecular and Cellular Pharmacology (IPMC) in Nice, France, the methodology was first developed for regular scRNA-seq studies but revealed compatible with emerging spatial technologies.
Kevin Lebrigand, first author on the Spatial Isoform Transcriptomics paper, explains: “We were already previously interested in isoforms, but we realized that spatial transcriptomics technologies relied on processes similar to classic scRNA-seq from the library preparation step onwards. So we decided to adapt our methodology with the added spatial dimension to create SiT.”
While SiT relies on the efficacy and limits of the spatial transcriptomic technology in use (in this specific case, 10X’s Visium Platform), it enables the detection, quantification and mapping of isoforms in a much more explorative fashion than before by covering the entire transcriptome, rather than focusing on a few transcripts of known architectures. As a demonstration, the team at IPMC identified 33,097 unique Gencode isoforms corresponding to 16,899 genes out of which 7,846 were multi-isoforms expresser across a coronal section of a mouse brain.
Check out this article if you want to read more about the progress of spatially resolved single-cell data across different methods!
Combining 3’ short-read and full-length sequencing for accurate mapping
Spatial Isoform Transcriptomics relies on the parallel sequencing of a spatially-resolved sample through classic 3’ short read sequencing (Illumina) and full-length sequencing (Nanopore).
Once the mRNA from a tissue sample is captured and tagged with a spatial barcode, reverse-transcribed and amplified, the cDNAs are split into two pools. One pool is fragmented and libraries prepared the standard way for 3’ short-read sequencing while the other goes for nanopores sequencing revealing the sequence of molecules exons. Full-length transcripts are then matched to their short-end counterpart using spatial barcode and Unique Molecular Identifiers (UMI) correspondences then compared to Gencode reference isoforms for production of isoform-level profiles.
“Keeping the 3’ short-read sequencing is valuable because of the accuracy of the detection of the UMIs, the Unique Molecular Identifiers, to correct possible amplifications biases from the PCR step”, Lebrigand adds. “We have developed the full bioinformatic pipeline to accurately transfer the barcodes from short to long-read, detect differential gene splicing pattern and map it to spatial tissue architecture. Isoforms are easy to detect at the moment due to their broad changes in the sequence. Smaller mutations down to the single nucleotide such as A-to-I RNA editing or somatic mutation require deeper sequencing to work with more accurate molecule consensus sequence generated from the pool of reads available from one initial molecule.”
The team developed the require bioinformatics solution to detect new isoforms in single cell experiment, this was not a goal of SiT, the resolution of current spatial transcriptomics remaining to be improved (a barcoding spot for the Visium is 55μm wide, thus covering several cells at once).
A strength of SiT is the ability to provide information about differential gene isoform pattern expression for external spatial datasets. Once the spatial mapping of full-length isoforms on a tissue has been established, it can be applied to other existing spatial or single-cell RNA-seq datasets initially only including 3’ short-read sequencing, thus adding valuable information on the tissue.
Detection of isoforms relies on the progress of full-length sequencing technologies
The robustness of the technique obviously depends on the accuracy of the full-length sequencing methodology, which used to be an issue for a while, but significant progress was made over the past few years.
“PacBio used to have the lead for full-length sequencing”, Lebrigand continues. “But recently Nanopore increased their sequencing accuracy from 85% to 95%. Nowadays, PacBio can process up to 4 million full-length sequences at very high accuracy, but Nanopore reaches up to 120 million at 95% accuracy for the same price. Looking at the size of the datasets generated by modern spatial and single-cell RNA-seq experiments, Nanopore was a more obvious choice and the way SiT is coded can alleviate for the lower accuracy”.
Future prospects for Spatial Isoform Transcriptomics
“For now we have looked into inbred mice without mutations to demonstrate the efficiency of our methodology,” states Lebrigand. “Our next step will be to look for somatic mutations in disease models, hoping to map spatially differentiated isoforms that might be involved in the development of illnesses. Our ultimate goal would be to use the method in a clinical setting to enable ever more detailed diagnostics”.
Such development would reveal particularly useful in cancer research, trying to figure out if individual carcinogenic cells might express similar genes but a different isoform, which could trigger different cellular processes or could even code for a different protein altogether.
“On the technical side, my colleague Rainer Waldmann [also at the IPMC platform] is working on updating the methodology so we might bypass 3’ short-read sequencing altogether and rely only on full-length sequencing to achieve similar outcomes. This would decrease sequencing costs significantly.”
What happens to those spatially detected isoforms?
The sequences from all the detected isoforms using SiT are publicly available and the team at IPMC is looking forward to collaborate with existing cell atlases to complete their databases with spatially resolved isoform distribution.
The Spatial Isoform Transcriptomics method is described in “The spatial landscape of gene expression isoforms in tissue sections”, Lebrigand et al., currently in pre-print. DOI: https://doi.org/10.1101/2020.08.24.252296