0:00
OK.
0:00
Next, we have Claire Williams, Data Scientist for Bruker Spatial Biology.
0:05
I'll let Claire take it from here.
0:08
Thank you.
0:09
As I said, my name is Claire Williams.
0:11
I'm a Data Scientist at Bruker Spatial Biology in the R&D department.
0:15
And today I want to tell you about some of the analysis that we're doing to try to better understand whole transcriptome, spatial transcriptomics.
0:24
So to start off with, the data that I'll be presenting are from the CosMx Spatial Molecular Imager.
0:29
So this platform has a robust and scalable chemistry and it was designed and validated for FFPE.
0:35
Today I'll be focusing on data from our whole transcriptome panel, which covers over 99% of the protein coding transcriptome. Based on our cell segmentation algorithms we're able to then map all of those transcripts back to single cells and call where in the tissue all of these different transcripts are being expressed.
0:57
That's all I'm going to say about the chemistry and the sample prep for this, but we did post a preprint on bioRxiv in December describing the modifications that we had to make in order to get the whole transcriptome panel working with our platform.
1:12
One thing within this preprint that we saw is that we're now generating really massive data sets.
1:19
So for example, in this preprint, we described six different tissues and over those six tissues, we have over a million cells and we've got billions of transcripts.
1:28
And so one of the challenges that we're facing is what do we do when we have this really large data.
1:36
So from the beginning, as we developed CosMx, at the same time we developed a cloud computing platform, AtoMx, which ingests CosMx data, it decodes it and it allows for processing of those data without any coding required.
1:51
So within AtoMx you can segment your cells and we have a tunable machine learning approach there.
1:55
So you can adjust that segmentation to fit your tissue.
2:01
We also have multi sample and interactive analysis available so that you can cell type, you can map, you can do neighbourhoods, all these kind of standard analysis that you might want to do with your transcriptomic data.
2:14
We also allow for the export of data into open source formats that we can move those data into Squidpy or other of these common open source platforms that you might be used to using.
2:26
And our bioinformatics team is constantly developing new spatial algorithms to apply to these single cell data.
2:34
And so today I want to take you on a tour through various algorithms that we're both developing and employing to analyse these whole transcriptome data.
2:45
And as a centring piece, I'm going to focus all on one sample and describe these analysis in the context of this one sample.
2:53
So the sample that we'll focus on is a breast cancer biopsy.
2:57
This is from an invasive ductal carcinoma.
3:00
This tissue that I'm focusing on here on here is about 100 square millimetres.
3:06
It's about 650,000 total cells.
3:09
We've stained it with four proteins as well as DAPI, which allows us to do that single cell segmentation.
3:15
We queried 18,935 unique RNA transcripts, and across all of the cells, that's about 1.27 billion cellular transcripts that we're talking about here.
3:27
And altogether, that's about a TB of raw decoded data.
3:31
So when we have this big data set, I know sometimes it can be kind of overwhelming when we get data like this.
3:35
So how do we think about going after it?
3:38
And what are kind of the first analysis that we often do?
3:43
And so I'll frame those around four different questions.
3:47
So the first question, when I see a tissue like this, I want to know broadly what survival strategies the tumour is using, what's broadly going on across this tissue.
3:57
Next, I'll ask how the tumour behaviour changes as it spreads from the primary lobe.
4:01
And what I mean by that is that based on pathological review, we'll see that there are two lobes in here.
4:06
There's a primary lobe down at the bottom and then another lobe up top.
4:10
So what's different between these two?
4:11
How is the tumour changing as it moves between those two regions. I'll assess how the tumour microenvironment effects cell behaviour.
4:20
So for a given cell type, how does it change what it's expressing?
4:26
And then finally, I'll talk about something that's a little bit more exploratory that the last talk really teed up and how do alterations drive tumour evolution.
4:34
OK, so let's get started with the survival strategies.
4:38
And what I mean by this is broadly, what pathways are active?
4:41
How can I quickly understand what's going on in this tissue?
4:46
To do this, we use the comprehensive Python package LIANA+, as well as the database PROGENCy, which contains gene signatures for a number of different pathways to look at the activity of these different pathways downstream of transcription factors and other signalling molecules broadly across the tissue.
5:05
So the first two that I'll show here are the Trail pathway and TGFb.
5:09
And we see that both of these pathways are broadly down regulated across the tissue.
5:14
So all of these areas that are blue across both of these images here are where we're seeing reduced activity at these two pathways.
5:23
This suggests to us that across the entire tumour, we've got a broad global evasion of apoptosis and growth inhibition.
5:32
Just probably what we would expect if this is a tumour that's growing very quickly, there's going to be reduced apoptosis across it.
5:37
But we also found other pathways that were not homogeneous across the whole tissue.
5:44
So for example, we found the JAK-STAT pathway, which is really broadly depressed in one lobe of the tumour, but much more highly expressed in the bottom lobe.
5:55
So again, there we've got that lower lobe with high expression and then the upper lobe with really low expression, suggesting that the tumour has both an immune hot and an immune low region.
6:06
We also found other pathways that showed this dichotomy and expression between the upper lobe and the lower lobe.
6:13
One example of that was the oestrogen pathway.
6:15
So we're seeing that signalling downstream of oestrogen is low in the bottom but high in that upper lobe, suggesting that the tumour has this hormone dependent growth region and a hormone independent lobe.
6:28
And I want you to keep that in mind that it's going to come back up at the end of the talk that we're seeing this potential difference in the hormone dependence of the two lobes.
6:38
So this is how we would very quickly get to start understanding what's going on in this tissue.
6:42
This is based on pathways without any cell typing needed to get a broad overview of what's going on. In the next section
6:51
now we've started to see this bottom lobe and this upper lobe.
6:57
And we want to understand what's different.
6:58
How is the tumour evolving as it's moving from that lower lobe, that primary lobe into the top lobe?
7:06
To address this question, we use an internally developed algorithm called InSituDiff, which is publicly available on GitHub and that allows us to compare one region of the tissue to other regions of the tissue.
7:20
This tool can also be used in other cases.
7:22
For example, if you have a healthy sample and a disease sample, you can compare across them.
7:27
Or if you had an injury case, you could compare injury adjacent regions of the tissue to injury distal regions of the tissue.
7:34
And broadly, what this algorithm does, is it looks at every cellular neighbourhood in this upper region and it'll identify the cellular neighbourhood in the baseline region that is most similar to it.
7:46
So we're trying to match neighbourhoods between the upper region and the lower region
7:51
in this case.
7:52
Once we've found those closest neighbourhoods, those most similar transcriptional neighbourhoods, we can record perturbations. So we can identify what's the difference in expression for a single gene between this upper lobe and the lower lobe.
8:08
For example, this gene I'm showing here is broadly similar in neighbourhoods around here as the neighbourhoods in the lower lobe, but is much more highly expressed around here in the centre region than down here.
8:20
If we do this across every single gene, we can generate a matrix of cell neighbourhoods in rows by genes in columns.
8:27
So we get this large matrix of about 600,000 neighbourhoods by 19,000 genes, where the values represent the perturbation from baseline.
8:38
And now once we have a matrix like this, we can start to employ all of our standard computational biology approaches to understand what's going on with this matrix.
8:47
So we can identify highly perturbed genes, those genes that are either showing the largest difference or the broadest difference between our test area, our top lobe, and our baseline area.
9:01
We can cluster the columns of this matrix, which would be the genes of this matrix, to identify modules of genes that show similar differences in expression across the whole tissue.
9:13
Or we can cluster the rows, the cellular neighbourhoods, to identify domains of cells that show similar perturbations across many genes.
9:23
And we can look at these different clusterings to understand more broadly what's going on in the tissue.
9:28
So, I'll show an example both of the gene modules and the spatial domains.
9:35
So first, here's an example of one of the gene modules that we identified.
9:38
So this is a group of three genes that show similar perturbations across the tissue.
9:44
In this case, I'm showing the sum perturbation of those 3 genes and we can see that they're broadly up regulated in that upper part of the tumour.
9:53
When we looked into what these genes do, these are genes related to invasion, migration and cell metabolism, suggesting that this upper tumour is showing elevated levels of proliferation and invasion.
10:05
And in fact in some of the later slides I'll refer to this top part of the tumour as the invasive region.
10:12
So this is the area that's really growing much more quickly.
10:16
We can also look at domains of cells that show similar perturbations in genes.
10:22
In this case, I'm colouring each domain in a different colour.
10:25
So you can see we've got about five different domains identified here that show similar sets of gene perturbations.
10:31
If we look at one of these domains, for example the Tan domain, which is at the front of that tumour, we see that it's enriched for pathways related to proliferation and invasion.
10:40
This is really the leading front of the cancer moving forward.
10:45
If we look just behind that into the light blue domain, we see pathways related to stress response, nutrient deprivation, cellular starvation, suggesting that this is kind of the wreckage behind that leading front where the cells are really stressed out just behind that leading front.
11:04
So with this, we're now really gaining an understanding of what's going on in this tissue and we haven't even done any cell typing yet.
11:10
We've just visually looked at it and said what pathways are expressed?
11:13
Which areas are we most interested based on pathological review?
11:17
For this next section, I am going to now go ahead and do cell typing and look at how a single cell type changes based on the microenvironment that it finds itself in.
11:28
And this is something that I think is a real strength of spatial data.
11:33
So when we're in the single cell RNA-seq era, we can ask a differential expression question of how does one cluster differ from another?
11:40
How do two different cell types differ in the spatial era?
11:43
Now we can ask how does a single cell type change in response to the spatial context?
11:48
So for example, how does a B cell located inside the tumour differ from a B cell in the in the stroma at the tumour stroma boundary?
11:58
So as I think of this in two steps, the first step here is we need to assign cell types and then we'll need to perform that spatially aware differential expression.
12:07
To assign cell types, we use the algorithm InSituType, which is also available on GitHub.
12:12
And this algorithm compares the expression profile of every single cell to a reference profile.
12:19
In this case, we ran it in a semi supervised mode.
12:21
So we assigned cell types to either a reference cell, a cell type in the reference profile which was derived from a healthy breast, or a novel cell type.
12:30
Because we're working with the tumour here, these tumour cells will not be found in that healthy reference.
12:37
With this, we identified 44 different cell types, including all of the standard cell types that you would expect to see in the tissue.
12:45
So we have tumour cell types as well as a number of immune cell types.
12:50
With these cell types, we can then move on to differential expression.
12:54
And one thing to keep in mind when we're doing differential expression in the spatial context in our platform and in other imaging platforms is that segmentation can play a big role here.
13:04
So segmentation, we typically will perform this in 2 dimensions, but our tissue is 3 dimensional and so some transcripts on the edges might get misassigned due to that overlap in 3D space.
13:18
We have algorithms to help with this, such as fast reset, again available on GitHub, which can assign some of those transcripts back to the right origin cell, but we can also take this into account when we perform differential expression.
13:32
There are two main approaches that we use to do this.
13:35
First, we filter out misassigned genes, genes that are unlikely to be in this cell type of interest.
13:41
So for example, we won't consider keratins when we're comparing T cells.
13:47
Secondly, second, we explicitly model the overlap.
13:50
So for every cell, we measure gene expression in the neighbouring cells and include that as a term in our model.
13:56
When we apply this to our tissue, for example, we can compare activated CD8 T-cells in the external stromo domain to activated CD8 T-cells in the interior stromo domain.
14:07
We identified 20 high confidence genes depleted in the tumour interior.
14:12
These genes are related to proliferation, metabolism and translation, suggesting that those T-cells inside the tumour are exhausted.
14:22
OK.
14:22
And the final section I have was to look at what alterations are driving the tumour evolution.
14:27
And this is really relevant to that last talk that we just saw looking at CNVs.
14:31
So we know that as tumours evolve, they accumulate more copy number of variations.
14:38
Here I applied in first CNVs, which looks at the expression of genes that are chromosomally adjacent to identify large regions of the genome that are elevated in expression or decrease in expression to infer that potentially there is a copy number variation.
14:53
So for example, in the red regions, we'd see an increase in the genes here, and in the blue region we'd see a decrease.
15:02
We also have spatial data.
15:03
So we can then take these CNVs and map them back to space.
15:06
So for example, if we look at one of these CNVs, which is an insertion on chromosome 12, and then look at where it is in space, and this is a little bit hard to see, but we see more expression or more chromosomal gains in that upper lobe, that invasive lobe.
15:22
When we looked back at this CNV, we saw that expand HER3, which is a protein that is known to be up regulated in breast cancers that have evaded hormonal treatments.
15:34
So potentially we're seeing the mechanism that this top tissue and this top region of the tumour has been able to evade a hormonal treatment and that's why we're seeing that elevated oestrogen signalling up there.
15:48
So today I took you through four different analysis approaches that we use to work on whole transcriptome spatial data.
15:56
We looked at the first two approaches that didn't require any cell typing, and then we performed cell typing to do differential expression and looked at CNV analysis.
16:05
As you can imagine, these are only two, only a small set of the analysis that we're working on.
16:11
Our bioinformatics team is also maintaining a blog called the CosMx Analysis Scratch Space Blog.
16:17
I'd encourage you to check it out.
16:19
This is a forum for our bioinformatics team to very quickly share analysis, best practises and methods that we're prototyping and get them out to the public to allow you to use them very quickly.
16:31
With that, I'd like to thank everybody who contributed to this presentation and particularly those who are listed who contributed either analysis or plots that were shown here.
16:42
With that, I believe I'm at my time and I would like to thank you all for listening.
16:46
I'm happy to take any questions.
16:49
Round of applause for Claire.
16:50
Thank you so much.
16:51
Anybody have any questions for Claire at this time?
17:00
Very nice talk.
17:02
I have maybe slightly more technical question, very interested in your CNV analysis using infra CNV.
17:08
Have you tried other algorithms?
17:10
But does it only work when you do, when you have the whole transcriptome panel or have you tried your earlier versions?
17:19
Yeah.
17:20
So I think there were two questions in there of whether we've tried other algorithms.
17:25
I'm really excited to try what the last speaker showed, looking at just the chromosomal plotting that he tried.
17:32
So, I'm looking forward to give that a try.
17:34
And 1st CNV is the first one that we've gotten working so far.
17:37
We'd certainly love to try others.
17:40
At this point.
17:40
We've had success on a number of whole transcriptome data sets.
17:45
We haven't yet tried it on the 6K dataset or the one K, I imagine the 1K is going to be too few genes.
17:52
But again, we haven't tried it there.
17:54
We can talk we because we tried it on 6K and it, oh, sort of works, but not quite.
17:58
So yeah, I'd love to hear more about that.
18:02
Thank you.
18:03
Anyone else?
18:03
I'm sorry.
18:11
Hi, for which part?
18:33
So for the.
19:05
Yeah.
19:05
So in this case, we haven't run the sample on other platforms.
19:10
I think that would be a great thing to do.
19:12
We are looking into doing some genomic sequencing of it to try to validate those CNV results that we have.
19:20
At the moment.
19:21
We don't have any plans to do other single cell RNA sequencing or anything like that.
19:26
All right, anybody else?
19:28
All right.
19:35
Hi, really nice talk, I was having a question around the InSituDiff section and maybe I missed it but do you need some manual labelling also to do to run this type of analysis?
19:48
The underlying question is how stable it is whenever you're like looking at neighbourhoods and domain analysis or how you cross compare, do you need this histopathology or annotation to do this type of comparison that you were describing?
20:05
Yeah.
20:06
So what the question is whether we need to have some histopathology to decide which our baseline region should be?
20:13
I think that depends on what your question is.
20:14
So in this case, we really thought of it as an image first analysis where we saw this, we based on H&E saw our region that we wanted to compare to, but we've also applied it in other cases.
20:27
So for example, we had some samples from West Nile virus where we had mice with West Nile and mice without West Nile and we could compare their brains.
20:35
So we just in that case knew a healthy brain and we're able to compare it to an infected brain.
20:40
And so we didn't need to know any pathology beforehand.
20:43
And this this approach is really good at finding similar structures.
20:47
So in that case, we could compare the hippocampus in one brain to the hippocampus in the other brain without drawing out where the hippocampus was or anything like that, and then transfer to multiple samples.
20:57
It's not just one slide.
20:59
Yeah, yeah.
20:59
So it can be it can work across multiple samples.

