0:00
Yes.
0:00
So hi, good morning everybody.
0:03
I'm indeed here with Single Cell Discoveries company we founded seven years ago and we're a single cell Balkan spatial transcriptomic service provider or CRO.
0:12
And I'm giving you a slightly different talk than normally today where I'm trying to distil basically 10 years of helping people, very different people at different institutes and companies with single cell sequencing down to a few practical questions that people can ask themselves before starting a single cell experiment.
0:28
So just out of curiosity for myself is who here has done more than three single cell experiments themselves?
0:38
All right, spatial transcriptomics, same question.
0:42
Great.
0:43
Who here is planning to do single cell or spatial or plans to outsource it anytime soon?
0:47
All right, pretty much the other half of the room, right.
0:52
OK.
0:53
And who here works for a single cell company?
0:57
All right.
0:58
Just to know how much I need to zoom in or out at certain areas.
1:02
So, yeah.
1:03
So before we get to the practical considerations and questions, just a little bit about our company.
1:09
So we're a service provider for transcriptomics assay, single cell bulk spatial and we've been developing single cell methodology since 2012, basically back in the early days of single cell sequencing.
1:21
We're 35 people located in Utrecht in the Netherlands where we have a lab that really was built from the ground up to be a single cell spatial and bulk service lab.
1:30
And what sets us aside from most service providers is that we don't just do commercially available assays, but we also do a lot of method development in house either on demand of our clients or just technologies we think will be useful for the field.
1:43
We designed those and offer those and that means that we can offer technologies that are quite unique to our company because we developed them in house.
1:49
And with that we also invest a lot in automation, which means we can be very fast for both small custom projects, but also for very large screening sized projects.
2:01
So this all allows us to do very end to end very customised services because we have a lot of expertise in our team.
2:07
That means that we can help people basically start from project design onwards.
2:12
So really think about what's your biological question and what do you need to achieve with it?
2:17
Then run the samples in our lab, maybe clean them up if they're very difficult samples with different technologies to offer sequence in house and offer very advanced data analysis.
2:25
It really gets to the final question that people want to answer and not just as a raw data dump basically.
2:31
So and because we've been a CRO for seven years and before the core facility for another three, we've seen a lot of different tissues from the human body.
2:39
So pretty much the whole human body, we sequence more than 40 different organisms at single cell level and that includes very tricky to deal with material.
2:47
So with all that sort of pattern recognition we've developed, we see a few things that often are either ignored or left out when planning a single cell experiment.
2:57
So that's what I'm trying to distil down today.
3:00
And just a quick overview of the technology, see we offer them.
3:04
The key message is here not to go into detail about every technology, but that different biological questions require different technological solutions.
3:12
So it's not one-size-fits-all for single cell sequencing or spatial.
3:17
So what we offer on the single cell frontier is on the left bucket is what we call plate based single cell sequencing.
3:22
So that's not combinatorial barcoding plate based with real single cells and single wells, SORT & VASA-seq.
3:29
Those are flow cytometry plate based.
3:31
And these are great when you need to analyse a rare cell population in very high amounts of detail.
3:37
So VASA-seq for example, gives up to 12,000 genes per single cell and it also gives total RNA and full length.
3:44
So these are great for very sensitive readouts on limited numbers of cells for you know, can be many samples if you want.
3:51
Then in what we call a high throughput bucket, we have commercially available platforms, 10X, Parsebio, Scalebio that are great when you try to analyse thousands to even millions of cells.
4:03
So these are great for general large screening experiments or just to figure out what a sample looks like even to the rarest population of cells.
4:11
And upstream of single cell.
4:12
What we often end up doing is bulk RNA sequencing and we offer that in two slightly different formats.
4:19
And what's customary, we have RNA sequencing that takes very low amounts of input, so down to 12 cells per sample because it was originally a single cell protocol.
4:29
So it's very sensitive and on the other end of the spectrum we can do very high throughput.
4:33
So we can screen tens of thousands of bulk RNA sequencing samples for a very low cost because we automated it heavily in a drug seq like fashion and we call that discovery seq.
4:44
So that's great usually upstream of single cell.
4:46
And because we have a lot of sequencing, you know, capacity in our lab, we also offer that to community for people who make their own libraries as a fast turnaround sequencing service of five to 10 days usually.
4:58
Now those are the more standardised technologies that we run sort of the same way every time, but some questions really require tailored assays.
5:05
So for that we have custom services.
5:07
So for example, for cell and gene therapy space, we can make targeted enrichment of specific, you know, AAV barcode.
5:15
So to really figure out what cell is expressing what, viral barcode.
5:19
So bio distribution assays.
5:22
Similarly for data consulting, we offer support sometimes from months on end for someone to really figure out what they're trying to figure out and get to the core of their biological question.
5:31
And that's a very tailored workflow but on the bioinformatics side. We have multiomics, so multiple readouts, multiple omics from the same cell, DNA/RNA vac-seq, methylation and spatial transcriptomics, which is the other big, you know, topic in this conference, which we do in the form of for now, 10X Genomics Visium HD, but we're planning to expand that as well in the future.
5:52
Yeah.
5:52
So that is an overview of the technologies.
5:54
And again, the key take home message here is that there's not one-size-fits-all.
5:57
So it's really important to consider before starting an experiment which technology you want to pick.
6:03
And this is where we get to the sort of four key considerations before starting an experiment.
6:09
The first one is quite obvious and quite to the point is think deeply about the sample type and sample quality that you have because that really ripples into the rest of the project.
6:19
So for example, how are the samples stored?
6:22
Are they fresh, are they cryo preserved?
6:23
If there are FFPE blocks, that basically rules out 90% of single cell assays, you're down to a handful.
6:31
Maybe you have methanol fixed protocols running.
6:34
Those work great for some cell types, don't really work for others.
6:37
So there's a lot of things to think about before even storing the sample if you're planning to do single cell later.
6:44
How many cells do you have?
6:44
It's good if you want to analyse millions of cells, but that means, yeah, you need to have multiple millions of cells or large chunks of tissues input as well.
6:52
And the key one, which is more of a qualitative thing, is what is the quality of the sample?
6:57
Sometimes you don't know, but you can estimate more on that later and you can measure it by, you know, doing a simple viability count, for example.
7:05
And also think about how many samples do I need to analyse at the same time.
7:09
So maybe you want to analyse 100 samples.
7:11
So you think I need, you know, a high throughput technology, but those are really designed for large one batch experiment.
7:18
If you need to process two samples at a time over the course of two years now maybe a different storage condition of different technologies, much better.
7:25
So first question to ask yourself is how difficult is my sample?
7:29
So if you’re working with fatty tissue, you're in for a rough time usually.
7:33
So you have to really think about how to store it, how to process it.
7:36
Do I need nuclear extraction?
7:37
Do I and validate this before starting?
7:41
That's question number 1 how difficult or how is my sample type and how is it stored or how should I store it?
7:47
So the second one is a little bit more involved on picking the right technology.
7:51
A question we would get very often asked is how many cells do I need to analyse?
7:56
And this is by no means a generally accepted framework of how the single cell world is divided, but it's what I find useful.
8:01
These are four general buckets and we divided them in sort of mechanical or manual, very instrument free single cell methodology.
8:10
So you have Illumina 3 prime or [unclear] seq.
8:11
They're basically any more old school manual single cell protocols.
8:15
And those are great when you have lower numbers of samples at the time, maybe very few cells that you want to maximally capture.
8:22
So you want to capture almost all cells you have in your assay.
8:25
So that's where these methods really come in or when you have, yeah, like I said, low numbers of samples that you need to run at low cost every time because you don't want to buy a huge kit once. Then sort of in cell numbers, Moving up from that, we have the plate based assay.
8:40
So SORT and VASA is what we do.
8:41
These are 384 well plates assays.
8:45
SMART seq is another one.
8:46
And these typically give the highest sensitivity, so the highest number of genes per cell.
8:51
But you need to have a flow sorter or another way to put cells into plates.
8:54
And typically these are in the couple thousandth maximum numbers of cells per sample.
9:00
Moving up from that, we have microfluidics where I put 10X and B Rhapsody, which is not really a microfluidics machine, but it's tends to sit in the same range of questions and cell numbers and sample numbers.
9:10
So I put them side by side there.
9:12
And they also have similar assays, ATAC-seq, TCR, etcetera.
9:16
So these are kind of the most widely used category of cell numbers because they really cater to, you know, 80% of biological questions.
9:25
And they're great for when you want to do thousands to tens of thousands, maybe hundreds of thousands of cells.
9:30
But beyond that, you really get into the split pool or combinatorial barcoding range where Parse and Scale are the two methods.
9:36
So they're more widely used.
9:39
These are both recently been acquired; Parse by QIAGEN and Scale by 10X.
9:44
So that's kind of the range of yeah, technologies that we have.
9:49
But how to pick one and how to know how many cells to aim for?
9:52
There are many ways of deciding this, but the most useful question that we often like to ask very early on is what is the rarest cell population that you really care about that you cannot enrich.
10:04
If you can enrich it, you can put it in a plate and that's you're in a different experiment.
10:08
Let's say it's 10% of your total population.
10:11
That means you basically can pick any assay here because if you assay 1000 cells, you have 100 cells from that cell population. That's enough to do most analysis you want to do.
10:20
If it's 1%, that means you're kind of more on the right hand side of this schematic here, microfluidics split pool, it's 0.1% and you cannot enrich it, but you really want to capture that cell population.
10:31
You need to analyse hundreds of thousands of cells to get a good representation.
10:35
So it's a very useful question to eliminates sort of being on the completely wrong end of the spectrum technology wise here.
10:41
And if you don't know, a useful proxy question is how complex is the tissue?
10:45
If it's a cell line, it's unlikely that you have 10s of dozens of cell types that are populations there.
10:52
If it's brain, now you know it's probably going to be very complex and you need to analyse more and more cells to really get to the rare cell populations.
10:59
So that's question number 2 is how many cells do I need?
11:02
And it's very hard to know this a priori.
11:04
And a very related question is how many reads per cell do I need to aim for?
11:08
And simple answer is it depends on your budget because the more cells you analyse, the more you need to sequence and the sequencing cost kind of scale linearly with the number of reads, right, number of cells times number of reads that’s your sequencing budget.
11:22
However, generally what we see is that for the low end, so 384 cells per sample, we tend to be at the 10,000 - 1 million even reads per cell, the sort of the higher end.
11:33
So if you're looking at 10s of thousands of cells or millions of cells, we tend to be at 20,000 to 50,000 reads per cell.
11:39
That's sort of the general ballpark.
11:42
And if you want to know sort of in those orders of magnitude where you need to be, a good question to ask is, are we sort of generally cell typing?
11:51
So counting the numbers of cells per cell type that you can do with relatively low numbers of reads per cell, because you can call a cell type with 20K reads, that's usually not a problem.
12:00
Or are you trying to do some complicated gene Co expression analysis where you don't want to miss any, you know, events basically?
12:07
So then you need to stay on the side of caution and have higher numbers of reads per cell.
12:11
So that is sort of how many cells, how many reads those are A couple questions are very useful to ask.
12:17
However, it's very difficult to really know a priority if you pick the right combination of variables.
12:24
And that leads to sort of general question 2 is have I done or should I do a pilot experiment?
12:31
Pilot experiments are often overlooked or ignored because it's very tempting to run a massive experiment immediately, get the day that we won't hit the milestone before, you know, December 31st, whatever it is.
12:42
However, a good pilot will often end up saving a lot of time and money because it will tell you, is my sample actually viable for single cell sequencing?
12:52
Did I pick the right number of cells?
12:53
Did I pick the right number of reads per cell?
12:56
If you do a good pilot, now you know where the sweet spot is and you can design a much better, larger experiment.
13:02
So that's question number 2 is especially if you've never done single cell sequencing before on this sample type, pilot first is almost always a good idea.
13:11
So 3 is all about data.
13:13
So we're going in sort of rapid discussion here.
13:15
They're not necessarily related, but we're trying to, again, trying to distil the areas that are often overlooked.
13:21
So data analysis is a whole topic in and of itself.
13:25
The main important thing to realise is that there are technical and biological artefacts are expected in single cell data, spatial data, NGS data in general.
13:33
And these can be of different types.
13:35
So we have stress response, ambient RNA, batch effects, and all of these have different ways of dealing with them.
13:42
And even after you do that, so let's say your data looks like the graph on the left here where you see 2 different experiments, blue and brown and they separate.
13:50
And then you do batch effect correction looks great.
13:53
Even after you that there are many paths to analysis.
13:56
So you can do cell typing, you can do pseudo time analysis, cell-cell interaction.
14:00
And each of these has different ways of handling this.
14:04
And there's not one correct way.
14:05
There's not one generally accepted way to do single cell data analysis.
14:09
Maybe in five years, but not right now.
14:12
One thing to mention for people who are not experts in bioinformatics is that it's very easy to get carried away with all these different methods.
14:20
But a good thing to always remember is that strong biology is easier to deal with in the sense that you will see it very quickly.
14:26
If you clean up your data, you get rid of your batch effects and you don't see any difference in clustering or in heat maps between your two experimental conditions that are supposed to be different.
14:35
Again, you're in for a bad time and you can do a lot of bioinformatics to try and solve that, but if you don't see it quickly in a very simple heat map, not doesn't matter all the statistical tricks that you can apply to it.
14:45
It's going to be difficult to really distil biology out there.
14:48
So just remember that if you don't see anything early on, you can get it.
14:52
You get can get to the point, but it's very tempting to over process or overfit the data.
14:58
So there are many ways that lead to Rome, but you might not lead, you know, to the correct Rome.
15:03
You might end up with something completely different that you aren't necessarily aiming for if you really over processed it.
15:10
So massaging the data is one thing sometimes necessary, but don't torture it until confesses because that's not a trustworthy answer.
15:18
And remember what your hypothesis is.
15:20
Remember what you're trying to figure out.
15:21
Like single cell data is very complex.
15:23
A 10 sample experiment, 20 sample experiment is 4.6 billion data points.
15:27
It was very easy to get lost in the weeds.
15:29
But remember what you're trying to do.
15:31
So question number 3 if you're not some very familiar with single cell sequencing, and I stole this from Ming Trang’s, LinkedIn, with his permission of course, is how will I analyse data or who will analyse the data?
15:42
And make sure that you give yourself or that person time to analyse data.
15:45
It's not something that can be done in a day.
15:48
A good data analysis, especially for a good data set, takes time.
15:52
So give yourself time, plan it in and figure out who's actually going to do the analysis.
15:58
All right, so 4 is all about validation.
16:02
And because we're at single cell and spatial conference, I want to zoom in on spatial transcriptomics.
16:08
And it's maybe the most overlooked thing in planning a single cell experiment is how to validate it.
16:12
So let's say everything's amazing, the sample type works out, you pick the right cell numbers, data analysis was smooth.
16:18
Now you have a list of genes.
16:20
How do you know which ones are meaningful?
16:23
How do you know how to really sort of verify that biology from there?
16:27
So the most important thing is to validate with some other assay.
16:30
And spatial is a very obvious one that people choose, we'll get into the reasons later.
16:36
And it's an equally or maybe even more so messy field than single cell sequencing because there are many ways of doing spatial and they're, you know, patterned wars and there's a lot of noise in the spatial field that makes it tricky to choose the right platform.
16:49
And to oversimplify things a little bit, the way we like to think about it is you in spatial transcriptomics, you have generally speaking oversimplifying, you have sequencing based platforms that tend to give you all genes.
17:01
So they're great for discovery, unbiased research and they tend to sacrifice a little bit in resolution.
17:06
So they're not exactly single cell resolution.
17:08
So 10X Genomics Visium, even the HD version is a good example of that.
17:12
It's great for discovery work.
17:13
It's great for getting high resolution into your tissue.
17:16
So you see mouse brain here with 10X Visium, but it's not exactly single cell.
17:20
And then there's microscopy based methods that gives slightly better resolution single cell, but usually you have to pick which genes you want to look at.
17:28
So that's a trade off sequencing based unbiased, microscopy based biased, better resolution and Nanostring’s CosMx is an example.
17:36
So you see the same tissue here with two different technologies.
17:38
And I'm sure you can appreciate that the cell types you see are more detailed in CosMx.
17:43
But there are other bottlenecks and other benefits for sequencing-based technologies.
17:46
For example, not all of them are compatible with FPPE, not all of them are easy to process quickly.
17:52
So they're very long machine times, especially microscopy-based ones like Xenium or CosMx.
17:57
So those are all things to consider when planning a spatial experiment.
18:00
But again, the key question and message here is how will I validate my findings?
18:04
A list of genes is not a cell type, it's not a functional cell state.
18:08
So you have to validate.
18:09
You just don't assume that the clusters you see in single cell data equals biology.
18:14
Compare them with node marker genes, validate them with other methods, and really try to validate what you're trying to look at.
18:20
And to give a very practical example, spatial data on immune oncology fields following single cell can tell you the difference between just the ingredients in your sample to what part of the tumour or what part of the OR these targets where are they?
18:35
And are those immune suppressive or like immune infiltrated tumours.
18:39
And that gives you a lot of additional knowledge to put high or low trust in specific genes that you get out of your single cells.
18:49
So that's question number 4 and sort of to link some of these together is often it's very good to do multiomics and not necessarily multiomics for the sake of doing multiomics from the same cell.
19:01
But what I mean is clicking different omics together in a smart way, in a gradual way.
19:07
So not everything at once.
19:09
So for example, what we often end up doing, let's say someone wants to profile a drug response across a lot of different treatments.
19:15
We usually start with bulk.
19:16
So we can do our discovery seq or drug seq like bulk where we could screen hundreds of thousands, tens of thousands of conditions to find, let's say the hundred that look most interesting at the bulk level.
19:27
And then you come in with single cell sequencing and analyse, let's say a couple dozen samples at 100,000 cells per sample to really understand why is this tissue or patient or whatever it is responding at the cell type level.
19:39
So which cell types actually do something?
19:42
So with data analysis, you can figure all that out and maybe you find a rare cell population that's different in a disease phenotype that kind of is responsible for this response or non response.
19:52
Then we can come in with a target enrichment facts based methods, for example, and really figure out what is happening in that rare cell population and get the sort of the highest quality of targets out of that assay.
20:04
And then follow that up with spatial transcriptomics.
20:06
Because usually once you find the interesting cell population, the next question is usually where is it?
20:13
So to wrap up, because it's a single cell and spatial conference.
20:17
Just a very basic idea of how single cell and spatial work together and what they're useful and if you already know this is a good way to explain it at your aunt's birthday party, what you're doing.
20:29
So let's say we're analysing the city of Rio de Janeiro and we're sequencing all the single cells, all the buildings, and we see that there's two clusters in a very simplified example, there's large concrete buildings that have a pool on the roof.
20:42
So these are the genes, the variables that we measure.
20:44
And there's small brick like buildings that have a water tank on the roof and there's a weird outlier cluster somewhere that we don't fully understand.
20:54
So that's single cell, right?
20:56
That's the transcriptomics.
20:57
It's a what is there?
20:58
What ingredients do I have?
21:00
Now Spatial gives you the Google Maps of where are these things are.
21:03
So you now figure out that these large buildings are in the centre of town.
21:07
They don't actually mingle with the smaller buildings.
21:10
I won't comment on the socio economic sort of this and that small brick like buildings are in the hills of the town, somewhere else.
21:18
So they're in very different areas of town and you can start generating hypothesis about what these are.
21:23
So that's spatial.
21:24
Where are they?
21:25
But more importantly, sometimes it's not just where are they, but what are they next to?
21:29
So we can figure out that the large buildings are in the centre of town and they're usually next to tennis fields.
21:34
And the small brick like buildings are in the hills and they're usually next to football fields.
21:39
So you can start inferring a lot of extra biology with spatial transcriptomics.
21:45
It's not just the where, but where next to.
21:47
And often you can only do this analysis properly if you have a single cell data.
21:51
If you have the list of buildings and individual building blocks, so they play very well together.
21:56
So with that, I'm out of time.
21:58
So I want to end up saying that thank you for your time.
22:02
And if you have any questions or if you want to just brainstorm a project you're thinking about, feel free to send me an e-mail, visit our booth at number 21 where Thomas sitting and myself or just book a meeting directly with one of my colleagues and happy to take any questions.
