0:44 

Hello everyone. 

 
0:50 
I'm very happy to be here and present Scailyte Science to you and I'm always happy to be at Next Genomics. 

 
0:57 
This is one of the places where people really understand what we are doing and appreciating it. 

 
1:02 
And as I was introduced, I'm an immunologist. 

 
1:04 
So be more, have mercy with the computational questions please. 

 
1:10 
So what we do at Scailyte is already before big data was that big, our founders founded Scailyte on the basis of an AI or machine learning algorithm that was able to extract clinically relevant information from at that time not that common single cell data. 

 
1:30 
So this happened in 2017. 

 
1:33 
Since then we've tried to work with different single cell data types and projects in different applications. 

 
1:39 
And I'm happy to walk you through our machine learning algorithm and give you a use case in my favourite topic, cell therapies. 

 
1:53 
So let me start by defining the problem that we are solving. 

 
1:59 
You might all know about the complexity of human biology. 

 
2:02 
So there are many cells, many cell types. 

 
2:06 
This on the one side doesn't make the life of the drug developers very easy. 

 
2:10 
So clinical trials often fail because of lack of efficacy. 

 
2:14 
So this motivates us applying single cell analysis for biomarker discovery in order to find the right patient for the right treatment and single cell data and omics data in general its so rich, it actually offers this information to extract these predictive biomarkers for discovery. 

 
2:35 
On the other side, computational biologists might be familiar with the maze of omics data and what do you do with it and how do you extract clinically relevant information. 

 
2:45 
And this is where we apply our machine learning algorithm, which uses supervised representation learning to actually address particular question, particularly diagnosis or prediction of therapy response prognosis in order to make clinical impact. 

 
3:03 
So how do we do that? 

 
3:05 
We combine patient cohorts. 

 
3:07 
So in order to make clinical impact, we are really fixed on working with patient data. 

 
3:13 
We combine it with high quality clinical grade single cell data. 

 
3:16 
You're also familiar with the problems around generating high quality data if you want to make clinical impact. 

 
3:23 
And as we already heard as well, that to be approved in the regulatory stage, it has to be high quality. 

 
3:30 
So we really invest in generating high quality single cell omics data and apply our proprietary AI algorithm. 

 
3:38 
It's important to say that it's an explainable machine learning model. 

 
3:41 
It's not magic trained algorithm and combine these three features to extract clinically relevant biomarkers. 

 
3:53 
Now let me walk you through a very general workflow of how our projects work. 

 
3:59 
We start with a discovery cohort of patients, at least 10 responders and 10 non responders or 10 per condition, let's say. 

 
4:08 
If we're looking for new targets, we might compare disease versus healthy samples. 

 
4:14 
We are really dedicated into also advancing precision medicines. 

 
4:19 
So we are also involved in predictive biomarker discovery. 

 
4:24 
So we start with data and unknown endpoint and we feed 70% of the data to train our machine learning algorithm of what is a responder versus non responder, keep 30% for validation. 

 
4:36 
This is a standard computational method to train models and what's important is that while we've trained models, we know already there are differences between responder and non responder groups. 

 
4:48 
We can open up these models and highlight the cellular and molecular characteristics that informed these predictions. 

 
4:55 
So we come up with ranked genes and ranked cells that can be used as the basis of an assay prototype for biomarker, predictive biomarker or for additional targets and biological interpretations as well. 

 
5:11 
So maybe the ones who have already worked with single cell data can appreciate that we can do that in from data to a discovery within a few weeks. 

 
5:20 
So we don't need to dig into that data for a very long time to know if there are any differences and what are the differences. 

 
5:26 
And here we have drawn a picture of what that looks like. 

 
5:31 
I will become a little bit more computational, not very computational. 

 
5:35 
So we start with single cell data in U mab and the standard approach would usually require clustering and annotation of known cell types and you would start doing differential expression analysis between clusters hoping to find differences from each cluster. 

 
5:52 
So there are a few problems with that. 

 
5:54 
So obviously this method has led to many discoveries. 

 
5:57 
I don't want to diminish it's added value, but it depends on clustering which is already introducing bias. 

 
6:03 
It depends on the annotation which requires previous knowledge on what cells you're working with. 

 
6:09 
And you're paying so much money to generate single cell data and you're comparing clusters. 

 
6:14 
So we're actually reducing this resolution from single cell to cluster based. 

 
6:19 
And now on addition, if you now talk about complex biology, you might expect that you have a signature which requires more than one cell or one cluster. 

 
6:29 
If you have very rare cell population within a cluster, you might miss them with this approach. 

 
6:33 
Or if you have a complex signature that involves different cell types and clusters, you might also miss it. 

 
6:39 
What ScaiVision is doing, that's how we call our machine learning algorithm. 

 
6:45 
It uses supervised representation. 

 
6:46 
Learning takes the data as it is, doesn't care about clusters, doesn't care about what cells these are initially, and starts learning of what the given end point is. 

 
6:57 
So it goes through several iterations of learning and gets a pattern that tells what is a responder from non responder or disease versus healthy or treated versus non treated. 

 
7:10 
And this has a few advantages. 

 
7:12 
On the one side is very unbiased. 

 
7:14 
It doesn't care about the previous knowledge about cells or clusters, and it also is very sensitive in finding rare cell populations and is able to find complex signatures, which is often the case in complex biology, for example in predicting therapy response. 

 
7:32 
Now let me give you a more real-world example. 

 
7:35 
Here we used real data and we compared what the authors discovered by the standard methods and what ScaiVision discovered. 

 
7:43 
In this case, you're seeing here a CAR T infusion product, single cell army sequencing data. 

 
7:49 
This data was published 4 years ago in Nature Medicine. 

 
7:55 
It's one of the first big data sets, so we were lucky to have enough data to train our AI algorithm. 

 
8:03 
And the authors found a rare cell population. 

 
8:09 
You can see here they call it IAC cells. 

 
8:13 
So these cells were enriched in patients who developed high grade neurotoxicities using the standard approach comparing clusters of cells. 

 
8:23 
However, you see that if you only use these cells in order to predict whether these patients would develop neurotoxicity, you will not be very successful. 

 
8:33 
You’ll see some enrichment, that's about it. 

 
8:36 
And the authors also stated they've looked for differences within the CD8 and CD4 population between high grade and low grade toxicity patients and they didn't find any differences with the standard approach. 

 
8:47 
So what ScaiVision did is we didn't care about which cells are which and started looking for the differences between high grade and low grade toxicity patients and it found not only IAC cells, these are I can associated cells that were described by the authors, but in addition found other cells which together as a mixture managed to separate the population and predict the development of high grade toxicities. 

 
9:13 
Even if you not don't care about CAR T therapy, you can appreciate how this approach can actually maximise the differences and insights from such data. 

 
9:25 
I hope you care about CAR T cells. 

 
9:29 
Here I have an example of the pipeline of our discovery project or some of them at least. 

 
9:34 
We started with site of data from CTCL. 

 
9:37 
We discovered diagnostic biomarker in CTCL and here it's important to give these examples because you see here we start with site of data for discovery, but we end up with a flow cytometry panel that finds the rare population that diagnosis CTCL, which is used as an assay prototype. 

 
9:57 
We did the same with endometriosis after learning our lesson that CTCL is a very rare disease and nobody cares of developing a biomarker to detect it, especially not flow cytometry. 

 
10:07 
So we moved to endometriosis, which is a very common disease. 10% of women have diseases and there is no diagnostic biomarker for it. 

 
10:14 
So we use single cell RNA seq data. 

 
10:16 
That was five years ago. 

 
10:18 
It was very progressive project at that time, and we managed to reduce this to a 7G signature that we measure in QPC with QPCRSA in a bulk RNA sequencing. 

 
10:28 
Now this prototype is sold to HERA biotech and is developed here in the states, clinically. 

 
10:32 

We have other projects with which also start with some complex omics data. 

 
10:39 
We apply our algorithm and end up with a short panel that can be measured on a different approach which is more clinically relevant. 

 
10:47 
Last year we presented the endometriosis project. 

 
10:52 
So this year I would like to present my hobby, CAR T cells. 

 
10:58 
And this is a new data. 

 
10:59 
This is not the data that I showed you before. 

 
11:01 
This is a new data from last year which came out in November by Bai  et al. in Nature. 

 
11:09 
These guys treated more than 80 children with leukaemia and generated CITE-seq data already eight years ago by the way. 

 
11:19 
And some of these children survived eight years on CAR T therapy. 

 
11:24 
So however CAR T works in about 40-50% of the patients. 

 
11:28 
And so these children relapsed in less than a year. 

 
11:31 
So we collected we got the data only from the responders and non-responders. 

 
11:37 
So the children that survived more than five years and the ones that relapse within a year and characterise the CAR T product to find the differences whether we can predict long term survival in these from these products. 

 
11:51 
So we did the same that I presented already before. 

 
11:54 
We split the data in 70% for training our machine learning algorithm and 30% to validate our initial discoveries. 

 
12:03 
Another step that we introduced is the different CV splits in order to find to make the discoveries more robust. 

 
12:10 
So always taking another set of the samples for training and a separate one for validation and continued training our machine learning algorithm. 

 
12:20 
So we came up with models, many models. 

 
12:22 
You can see them here. 

 
12:25 
All of these models, we evaluate them using AUC, so accuracy of prediction on the training or the validation cohort. 

 
12:32 
And you see that most of these models have really good predictive power. 

 
12:37 
So on training data set, they're all about 1 and we have a cut off on the validation data set of above 0.8. 

 
12:45 
So we take some of these models, fuse them together and extract the molecular and cellular characteristics. 

 
12:51 
We can do that in several different ways. 

 
12:55 
On the one side we use ranked cells to see which cells are informing this prediction and the other side we use we ranked genes or proteins to see which features are informing this prediction. 

 
13:07 
So you can see here again in orange are highlighted the cells that inform prediction of therapy response or long-term survival of these patients. 

 
13:18 
And you see that this is not one cluster. 

 
13:20 
Again, it's a combination of many different cell types. 

 
13:23 
I've listed them here again, and you see that the combination of these cell types are able to really segregate responders from non-responders perfectly. 

 
13:31 
Obviously, it's not one cell type, which is also to be expected. 

 
13:34 
So here we've also went back to the data and annotated it in order to make it biologically more understandable for biologists like me. 

 
13:44 
And you can see that in responder and non-responder the data looks like that there are some obvious differences and not everything is very different. 

 
13:52 
And here you have the rest of the cells versus the cells that were selected with our algorithm of being predictive. 

 
13:58 
And while we have some very obvious differences like the enrichment of central memory CD4 cells for example, which is to be expected to correlate with long term survival, you have much fewer T-regs  which you can see here in pink. 

 
14:12 
So you don't have any T-regs in the products that respond well. 

 
14:19 
But there is also new biology that we found out. 

 
14:22 
We found gamma delta T cells being heavily enriched in the cells that predict therapy response. 

 
14:28 
And this was not something that we would have expected and maybe not even looked for it specifically if we used the standard approach and our algorithm had even less idea about CAR T cells than me. 

 
14:40 
Our data scientists are also a little bit ignorant on CAR T cells, so that's one thing and we can continue describing these cells and validating into in the lab what the role might be. 

 
14:52 
But this is not very practical if you want to develop an assay in order to apply it in the clinic and predict which patient might respond to this treatment. 

 
15:01 
So we also extracted a gene set using captum scoring. 

 
15:05 
I don't want to explain more about that, but it brings ranking to each gene within the trained models which we can cut in order to develop qPCR assay, which measures certain number of genes that are able to predict the therapy response to these patients. 

 
15:28 
Now since we are patenting that I couldn't show you each gene separately, but generally they're involved in cell migration, hemotaxis, actin rearrangement and so on. 

 
15:40 
So also not genes that you would immediately start looking for look for memory response and maybe some cytotoxicity. 

 
15:47 
But in our case, these were genes that popped up. 

 
15:51 
So I can just summarise from this project that ScaiVision highlights central memory cells and interestingly gamma delta T cells and specific gene features that predict therapy response to CAR T therapy. 

 
16:07 
And these can serve as an assay prototype or a predictive biomarker. 

 
16:14 
Now here I would like to summarise some of the mentioned characteristics and the not mentioned characteristics of ScaiVision. 

 
16:21 
It is indication agnostic. 

 
16:22 
You might have noticed that already on our portfolio of projects, it can integrate different types of omics data. 

 
16:31 
So we don't work only on with single cell RNA seq data, but we can integrate clinical and other type of data we can discover biomarkers from relatively small number of samples. 

 
16:43 
And many of you would be very sceptical about it here going as low as 10 + 10 patients. 

 
16:48 
But because of the richness of the single cell data, our data scientists have proven that to work really well. 

 
16:54 
Retaining the single cell resolution is something very important to find sensitive and complex signatures. 

 
17:00 
And we are very fast and translatable. 

 
17:02 
So that's an important characteristic for us because we really want to make clinical impact and are able to translate these discoveries into measurable assays. 

 
17:11 
And this comes out with many different outcomes here. 

 
17:15 
Before I finish, I would like to mention something that everyone is very interested about, data. 

 
17:20 
So we just want to grant and we're very happy about it, to generate omics data from IBD patients in order to find predictive biomarkers for standard treatments. 

 
17:30 
And this would include multi omic data, integration of different measurements within these patients and hopefully also new targets for the patients who do not respond to the current treatments. 

 
17:42 
We're also happy to collaborate on this initiative and generate even more data or find even further insights. 

 
17:49 
And with this, I thank you for your attention and I'm open for questions. 

 
17:56 
Thank you, Diana.