scRNA-seq artifacts from sample preparation

Benchtop Technical Tips

Blog sc RNA seq Artefacts

Updated on April 27th, 2021.

Independently of the scRNA-seq methodology you plan to use, there are genetic artifacts bound to arise from the sample preparation preceding the mRNA capture step. Being aware of the stimuli triggering spurious genetic expression and alteration of the transcriptomic profile is necessary for optimal experimental design.

Typically, the tissue dissociation and cell isolation processes can take up to several hours – in some cases days – during which the cells are removed from their normal environment. After cell capture, mRNAs are typically stabilized through their capture following cell lysis, so the entire vulnerable period runs from the initial sample collection to the cell lysis. During this time, cellular stress-related responses can lead to changes in the behavior and the morphology of the sample cells, even leading to cell death.

Limiting technical scRNA-seq artifacts during your sample preparation substantially improves the quality of your bioinformatic analysis. It helps avoiding the dreaded “garbage in, garbage out”, tells Denise Gay in her journey into single cell bioinformatic analysis.


Effect of Sampling Time

The time it takes to harvest samples immediately impacts the resulting transcriptomic profile of the different cell subtypes. In their investigation, Massoni-badosa et al. collected red blood cells from 5 patients and waited respectively 0, 2, 8, 24 and 48h hours before processing the samples through various scRNA-seq technologies (Massoni-Badosa et al., Genome Biology, 2020). Their analysis found significant shifts in their PCA analysis across all cell subtypes, correlating with the increase in the sampling time. Digging further using differential expression analysis, they identified a time-dependent decrease in the number of detected genes in all their datasets and a global downregulation of gene expression. Overall, they identified between 1000 and 2000 differentially regulated genes (depending on the sample type) over the course of 48h.

The genetic signature of such a time-dependent bias could – and should – be identified and corrected during the bioinformatic analysis of the resulting ScRNA-seq data. Nevertheless, the overall quality of the sample seems to be decreasing as the sampling time increase. Therefore, it might be worth investigating new ways of processing freshly harvested ScRNA-seq samples quicker although it might be arduous in specific cases, for example when the collection occurs outside the normal hours of the sample processing facilities.


Effect of Digestion Time and Temperature

Standard sample preparation methods for solid tissues require enzymatic and/or mechanical dissociation and, depending on the tissue origin, density, disease state, elastin, or collagen content, this may require long enzymatic digestion and/or vigorous mechanical disruption. Transcriptional machinery remains active at 37 °C, and extended incubation at high temperatures may introduce gene expression artifacts, unrelated to the biological state at the time of harvest. Moreover, extended incubation at higher temperatures in the absence of nutrients or anchorage, or harsh dissociation, may induce apoptosis or anoikis, polluting the viable cell population or generating low-quality suspensions.

After discovering a set of 507 genes – some of them related to cell stress pathways – strongly affected by the digestion temperature, O’Flanagan et al. performed a time-course ScRNA-seq experiment on breast cancer xenograft tissues. They sampled cells regularly over two hours of total digestion time, using either collagenase at 37°C or a cold protease at 6°C (O’Flanagan et al., Genome Biology, 2019). They found out digestion with collagenase substantially upregulated the expression of this core set of stress-related genes, with a subset even further expressed as the digestion time increased. Applying differential expression analysis to their entire dataset, they figured that 43% of the total 18,734 retained genes were differentially regulated after 2h of digestion time compared to only 30 minutes.

This example highlights the importance of initially refining the digestion process to be as short, mild, and as efficient as possible for your sample type, as it can affect the expression of a significant number of the detected genes.


Unusual sample types

Specific cell subtypes can be absent from a final ScRNA-seq dataset because of peculiar characteristics. Some tissues are particularly hard to dissociate (e.g. cardiomyocytes), meaning their cell population might be underrepresented in the resulting sample sent for ScRNA-processing (Ackers-Johnson et al., Nature Communications, 2018). Unusual cell subtypes might have unusual shapes or sizes, preventing them from being successfully processed with some ScRNA-seq technologies or instruments. Another example is cells suffering from anoikis after being removed from their anchorage onto an extracellular matrix during tissue dissociation – a process believed to be started only 3 hours after removal – lowering the cell recovery rate and decreasing the sample quality. An alternative to preserve the native physiological distribution between all cell subtypes and the genetic material of fragile cells is to extract the nuclei and sequence nuclear mRNA.


Conclusion

A successful SCRNA-seq experiment starts straight away during sample harvest, as studies seem to suggest the quality and relevance of the data slowly start deteriorating immediately and is time-dependent from the length of the entire sample preparation phase, well before the single cell mRNA capture step.

Even if the sampling and digestion times are well-optimized, another factor that can extend the sample preparation time and act upon the sample quality is the waiting time to access a ScRNA-seq platform or instrument where the mRNA will finally be extracted. Until a new solution is developed to accelerate this process, you can check updated methods to freeze or fix single-cells to preserve your samples and the integrity of your future dataset.