Our guest writer Louk Timmer is a Ph.D. candidate in the Van Rooij group at the Hubrecht Institute in the Netherlands. His research focuses on cellular communication and remodeling in heart disease in which bioinformatics and wet-lab technologies are combined.
With the rise of single-cell sequencing data, we generate an increasing amount of high-throughput data in which several cell types can be studied simultaneously. One of the possible applications of single-cell sequencing data is to map potential communication between cell (sub-)types, which can be depicted as intercellular communication networks. These networks can yield detailed insight into the role of different cell types and processes within the complex system of cellular communication during various biological states such as development or disease.
The context in our study about the healing process in cardiac tissues
The van Rooij lab previously developed a protocol that enabled single-cell sequencing of the adult mammalian heart including the large and fragile cardiomyocytes1. The next step for us was to apply this method during different phases after ischemia-reperfusion injury – a model that resembles the clinical situation in which a myocardial infarction is treated by reperfusing the occluded artery – to capture snapshots of the different temporal phases of the wound healing process. Therefore, we performed single-cell sequencing on the adult mouse heart 1, 3 and 14 days (acute, intermediate and chronic phases) post ischemia-reperfusion injury2.
As intra- and intercellular communication is essential for a cell’s function and is severely affected by biological settings such as pathology, aging or development, one of our aims was to identify communication networks for the different phases of cardiac repair in our study. When generating and interpreting communication networks based on single-cell transcriptomic data, several decision and aspects should be considered.
Revealing communication networks: considerations and interpretations in experimental design
Communication networks based on single-cell transcriptomic studies are a rich source for potential therapeutic targets, and follow-up studies. However, if one wants to interpret a given communication network in relation to true biology, it is essential to understand how the network was built and the assumptions underlying it. In essence, a communication network is nothing more than combining your single-cell transcriptomic data to information that states if a gene codes for a ligand or receptor and mapping the ligand-receptor interactions present between different cell types within your dataset.
For the cardiac field, it is of notice that almost all published intercellular communication networks so far are based on the same network of human ligand-receptor pairs3. This network contains a total of 2422 ligand-receptor pairs, of which 1894 were literature-supported at the time of publishing. These 2422 ligand-receptor pairs contain 708 unique ligands, while the original input for creating this database included 2132 putative ligands. This suggest that the communication networks, although extensive, are incomplete simply because the lack of scientific knowledge regarding cognate receptors. In addition, because the database includes putative interactions and many interactions are likely to be context dependent, it is reasonable to assume that generated ligand-interaction networks also contain false positives. Validation experiments are therefore always required when specific ligands or cellular interactions are being studied based on such a communication network.
Recently, a novel ligand-receptor database (SingleCellSignalR) was published consisting of 3251 ligand-receptor pairs4, all of which all are literature supported. This decreases the chance of false positives but as a trade-off increases the change of false negatives. SingleCellSignalR is a publicly available R package and has additional features such as a provided ligand-receptor score or receptors being directly linked to signaling pathways from Reactome and KEGG, providing researchers with additional insights. In contrast to a plain database of ligand-receptor pairs, SingleCellSignalR is already implemented as R package which can be convenient for applicability or consistency between studies. However, if one chooses to use such an approach, the assumptions underlying the R package should be known by the researcher to ensure correct interpretation of the produced communication network.
Wondering about how to best collaborate with bioinformaticians to use R packages and better analyze your intercellular networks? Check the advice of Marc Beyer and read about the journey of Denise Gay to learn her bioinformatic skills!
Identifying cell-cell communications from the scRNA-seq data
After we determined the different cell types present in our dataset, we mapped potential intercellular communication between all cell types present in our single-cell transcriptomic data for each phase of the wound healing response. We used a threshold of 20% of cells having at least 1 read count to include a ligand or receptor as expressed by a certain cell type. We subsequently used a previously published network of ligand-receptor interaction to map all potential ligand-receptor interactions between and within the different cell types3. This was then depicted in a network that included cell types in circles, which were connected by lines of which the thickness was proportional to the number of potential interactions.
Potential biases and pitfalls when interpreting the scRNA-seq data
One aspect to be aware of when interpreting communication networks, is that cellular composition is usually not taken into account. A relative low expression from a high abundant cell type could have a higher total contribution to the expression of a ligand compared to a relative high expression from a low abundant cell type. Such effects can be masked or skewed in the communication network, depending on the assumptions that underly the generation of the communication network. This is of particular importance when focusing on (sub-)cell types that are of low abundance.
Another important aspect is the lack of spatial information. As an example: following ischemic cardiac injury, cardiomyocytes surrounding the infarcted area are suggested to show a different gene expression profile after a myocardial infarction compared to cardiomyocytes in the remote zone2. It is conceivable that these cardiomyocytes have a distinct role in the communication network which may have very local effects. If one aims to focus on a specific anatomical area, complementary methods such as imaging analysis are required, to confirm or add spatial information.
Benchmarking inter-cellular communication networks
Different intercellular communication networks are often hard to compare with each other because networks are often built using different thresholds for accepting a ligand or receptor as being actively expressed by a certain cell type (e.g. 20% of cells within a cell type having non-zero reads).
Some networks are generated as hypergraphs (aiming to represent all ligand-receptor pairs within the dataset), whereas others are built as a more hierarchical network (aiming to represent most important ligand-receptor pairs within the dataset). Both hierarchical networks and hypergraphs can be used to identify communication candidates for future studies. However, if the network is intended to be used by other researchers, a hypergraph may be preferred as other researchers can apply a selection pipeline based on their specific research question. Of course, one can always re-analyze provided raw data and generate a new network based on a specific research question, but in this setting the structure of the predetermined networks is obviously irrelevant (beside the cumbersome character of the approach).
Even though the generation of intercellular communication networks relies on several assumptions, these networks are an excellent way to provide a detailed view of intercellular communication in a variety of biological settings. In addition, the number of interactions of potential interest for future research from one single network is impressive. Therefore, we expect that this will be a commonly applied analysis for many future single-cell transcriptomic studies. When interpreting these networks, it is important to consider the assumptions underlying that network whereby one must remind that not everything that can be counted counts, and not everything that counts can be counted.
- Gladka MM, Molenaar B, de Ruiter H, van der Elst S, Tsui H, Versteeg D, Lacraz GPA, Huibers MMH, van Oudenaarden A and van Rooij E. Single-Cell Sequencing of the Healthy and Diseased Heart Reveals Cytoskeleton-Associated Protein 4 as a New Modulator of Fibroblasts Activation. Circulation. 2018;138:166-180.
- Molenaar B, Timmer LT, Droog M, Perini I, Versteeg D, Kooijman L, Monshouwer-Kloots J, de Ruiter H, Gladka MM and van Rooij E. Single-cell transcriptomics following ischemic injury identifies a role for B2M in cardiac repair. Commun Biol. 2021;4:146.
- Ramilowski JA, Goldberg T, Harshbarger J, Kloppmann E, Lizio M, Satagopam VP, Itoh M, Kawaji H, Carninci P, Rost B and Forrest AR. A draft network of ligand-receptor-mediated multicellular signalling in human. Nat Commun. 2015;6:7866.
- Cabello-Aguilar S, Alame M, Kon-Sun-Tack F, Fau C, Lacroix M and Colinge J. SingleCellSignalR: inference of intercellular networks from single-cell transcriptomics. Nucleic Acids Res. 2020;48:e55.