Updated on April 27th, 2021.
Dr. Marc Beyer agreed to meet us to discuss his journey and the best collaborations he witnessed between biology and bioinformatic analysis regarding single-cell RNA sequencing (scRNA-seq).
Scipio: Good afternoon Dr. Beyer. Thank you for sparing some time today to share your experience about the bioinformatic side of scRNA-seq projects. It can be a daunting prospect when scientists come from a pure biology background and must face such a new, complex, and fast-paced scientific field to complete their scRNA-seq studies. I understand, you have been involved in single-cell bioinformatics almost since the very beginning, is that correct?
Dr. Marc Beyer: That’s right! When we started [bioinformatic analysis of scRNA-seq] back in 2013, there was nothing available. We had to gather the packages [modules to run specific computational programs] ourselves and try to assemble and run them the best we could. It is pretty impressive what we have been able to achieve in just the past few years: we are starting to have common pipeline standards in the field now, for primary data processing as well as downstream analysis.
=> If you’d like to read about the journey to learn single-cell bioinformatics, head over to the testimonial from our guest writer Denise Gay!
The field of single-cell bioinformatics is in constant evolution
S: I can imagine it must have been fastidious work, if there was no user-friendly software or even interfaces at the time. Are there any lessons from those early days that still apply today now that pipelines have started to emerge?
MB: Indeed, we learned early on that you have to “encapsulate” your analysis in a container, knowing the exact versions of the software and packages you used, so that If you need to run it again two years later, you can. Otherwise, you might be re-running your analysis two months later, but you will have different results because the packages have changed in the meantime. This still applies today with software versions from single-cell technology companies.
“Depending on what biological questions you would like to answer, the technology is completely different.”
S: Looking at the ever-growing list of tools available for single-cell analysis, it seems the pace is not slowing down. How do you keep up with these constant updates, the new computational methods and their use?
MB: Over time I have acquired some knowledge in basically analyzing transcriptome data. Not so much that I can program everything now by myself, simply because that is too huge a task, but I know how to interpret programs that people are writing. And I can explain to a bioinformatician how to analyze and how to interpret the data. To keep-up-to-date, people are using twitter, BioRxiv and bioinformatics community (HCA, LifeTime, etc.). Github is of tremendous help as newer/older version of packages are available.
Analyzing biological data properly needs a commitment to understanding bioinformatics
S: Coming from biology, it sounds pretty intimidating and quite a huge workload to commit to if you have to understand the concepts behind different programs and algorithms, even if you do not have to learn how to program directly! I might be tempted to just run my experiment and sequencing but leave the analysis to an experienced bioinformatician altogether.
MB: If you want somebody else to analyze your data, it’s just tough. I think we learned this the hard way. In principle, if people want to do something with us, we need to have an initial discussion about experimental design. What is their biological question? What do they want to address? To which level of details do they want to go? Because depending on what biological questions you would like to answer, the technology is completely different. And the way to analyze the data will also be completely different.
“Our most successful collaborations were when a PhD student or a postdoc who is willing to spend time learning comes to our lab for three months or so.”
S: So how do those discussions take place? How much of the biology behind the sample do you need to know before running an analysis?
MB: Well, when people approach us coming with cell types that we have no clue about, we simply say that it would be a very difficult project, because we have no idea about the biology. Therefore, I cannot tell you anything about how to set up the experiment, or how good the data will be because I do not know what to expect. On the other hand, I can use the algorithms that I normally run, which should give me valid results. But for the interpretation of the data, I have absolutely no idea what makes sense or not.
Meeting half-way for best collaborative results
S: I see. When this is the case, how do you deal with such a gap in knowledge? Is there a good approach to solve this?
MB: I think this is a critical question for the whole field! I am in the luxurious position that I know what happens there. I see how people can struggle, and how they try to bridge this gap of knowledge. And I think there is probably no perfect solution. One of them is to say, well, I invest into this and I want to learn at least the basics myself, to be able to understand what the other side tells me. And that can be both sides, right? If you think about a bioinformatician having little knowledge about the biology, it’s the same thing if you want to understand what you’re actually doing with the data. You can do a lot of things with data, but in principle it is for the other side where you are coming with biological knowledge. You might think you have found the perfect way to analyze the data, but unfortunately find no significant gene expression in there, because that’s it, that’s your result. And the other side might say, “well, we do have a biological effect”. Now, how do we come together?
S: So which solutions have you come up with for those issues? How did your most fruitful collaborations take place?
MB: Ideally, you have to meet half-way between the biology and the bioinformatics. If you can, on the one hand, get somebody in your team who wants to commit and learn some bioinformatics to start bridging the gap in knowledge. On the other hand, we can provide somebody at this interface with some knowledge in biology to bridge the rest of the way. Our most successful collaborations were when a PhD student or a postdoc who is willing to spend time learning comes to our lab for three months or so. Then they go back, they analyze the data with what they have learned, they ask questions, and we have a productive back-and-forth to setup the best analysis and interpretation we can.
S: Having a team member in a lab motivated and committed to put in the time to learn the basics in bioinformatics for the whole group sounds like an ideal solution indeed. Would you have any advice for the cases where this is unfortunately not a possible solution?
MB: I would recommend to everybody that you find partners to collaborate with who are experienced in your domain, even if that means collaborating with people somewhere else on the globe. It is nice to have a lab nearby that might do single-cell technologies, but if they have a completely different domain of knowledge, then often it’s not helping a lot. I mean, you can learn the technology there, yes. But for your biological question, then I think it’s better to actually ask people who are more experienced in your field.
S: That sounds like a sensible plan indeed! Thank you very much Dr. Beyer.
Dr. Marc Beyer is a research group leader at the German Center for Neurodegenerative Diseases (DZNE) in Bonn, Germany. After initially studying medicine, he pursued a post-graduate degree in bioinformatics in 2002, when the field was just barely taking off. A decade later, he participated in the very early days of single-cell transcriptomics, leading him to work with the Platform foR SinglE Cell GenomIcS and Epigenomics (PRECISE) today.
Interview of Dr. Marc Beyer recorded on Jan 8th, 2021, by Wilko Duprez.