Interview with Joshua Atkins, Cancer Epidemiology Unit, University of Oxford
Joshua Atkins
Senior Genomic Epidemiologist
Cancer Epidemiology Unit, University of Oxford
Format:
So good afternoon and a very warm welcome to Joshua Atkins. Joshua, thank you very much for joining us for today's interview with Oxford Global. Today we'll be discussing your work within the Cancer Epidemiology Unit at the University of Oxford, which is showcasing how multi-omics integration can contribute to a shift from reactive treatment to precision prevention in oncology.
So if we get started with the first question, could you please begin by outlining the key scientific question your team set out to address with this large scale proteogenomic study?
Thank you for the opportunity to interview. I'm a senior genomic epidemiologist here at the CEU and my interest lies within germline genetics and how we can use genetics to further understand cancer etiology . So the study that we did that was published in Nature Communications, our fundamental goal was to see whether or not we could actually find protein signatures that relate to cancer risk. And if they were, how far out could we detect these snitches before a formal diagnosis from a doctor? And the beauty of the UK Biobank is that they released 50,000 individuals that had Olink proteomic measurements on. And this has really changed the field on what we can do and what is actually possible. Because going back just a few years ago, we were only able to measure one or two proteins at a time. And now we have this tremendous technology where we can just take a snapshot of the blood proteome and see what's going on, which is beautiful. And when we overlap that with other omics technology, we can start to triangulate why these signals exist. So fundamentally, to answer that question, one, we wanted to look whether or not there was any signatures. And secondly, could we use the genetics to try and triangulate what these signals actually meant?
Fantastic. Thank you very much. What motivated the focus on circulating protein biomarkers, and how do genetically predicted protein levels offer an advantage over the traditional protein measurement approaches in the context of cancer risk?
OK. So here within the cancer epidemiology, we've been very interested in blood protein for quite a long time - in particular where we're super interested in a prostate specific protein called MSNB. So we're very heavily involved in prostate Cancer Research. So MSNB is quite a very interesting protein. Not too much is known about this protein and really, it's only been in the last few years that we've been able to really dive in and understand what the protein does. But in particular there is a variant at least lies about 50 base pairs up from the transcriptional start site of this gene. And this single nucleotide variant is the strongest pQTL that we know of across all proteins. So it is by far the strongest effect on protein or blood protein measurements. But what else is, what's more fascinating about this particular SNIP is that it's one of the top associated loci in prostate cancer risk. So what we know is that that men that carry this SNIP, they produce a lot more MSNB and they have a protective effect of developing prostate cancer. So we've had a lot of interest in in proteins way before these technologies evolved what it is today.
Now the advantage of using predicted protein levels is that we can now go look at cohorts that have had sequencing done on them but might not have had protein measures yet because there's only a new evolving technologies and it's quite expensive. So it's going to take some time for all these codes, but what these codes do have is very rich phenotypic data. So we can project potential protein levels and then look at the various risks. So not just cancer risks, we can look at other exposures too. And another, so for an example of this is that from our paper, we we used the 50,000 people that had protein instruments, made the genetic instruments. And then we project that out in the full cohort, and we could look at how the risk from these predictive teams to the teams. So it gave us better statistical power.
Yeah, fantastic. Thank you very much. And I think you've touched on my next question, which is about the dual-instrument model. So in terms of employing that dual-instrument model using cis-pQTLs and the exome-wide genetic scores, could you explain the rationale for that approach and what each instrument contributes to biomarker discovery?
Yeah, yeah, really nice question. So cis-pQTL. So these are variants that are nearby or within the cognitive gene that affect the levels, right. And what we've found over the past few years is that genes that we call essential genes. So they're fundamental in survival when keeping us alive. They are very intolerant to getting what we call nasty variation. So these types of genes, they are less likely to have cis variation. And if they do have cis variation, the effect is not that extreme. So what that means is that if we just stick to cis variation, that means that we can't sample all genes. So we're biassed to the genes that are essentially not central what they are. Yeah. So, we're really restricting what we can assess. But the beauty of cis-pQTLs is that you're less likely to be contended by cleotrophy. So cleotrophy is where a single variant effects multiple unrelated pathways and that's really hard to distinguish. So cis variation is quite good in that sense. And this is why they are fundamental for Mendelian randomization studies because we get around some of the assumptions there. But really, we want to, we really want to sample these essential genes. So, this is this is one of the fundamental reasons why we developed the exome score. So the exome score takes into account trans variation. So these can look at essentially other mechanisms is all the pathways involved within those proteins and looking at that type of disruption. We can then look at various triangulations as well. So we can look at the individual variants within that that makes up those scores and see whether or not they overlap with risk.
So the second part of that question, how do they contribute to biomarker discovery? So, for example, for the transvariation, right? So, these essential genes, right? We, can't really target these genes, right? So, we know that if we knock these genes out, it's bad. It's very bad. So essentially, it's like a knockout game over type of scenario. So targeting those genes is not quite therapeutically advantageous. So but if we're looking at the other things that transregulate those genes, we might be able to target those genes and then have sort of some sort of effects on that target. But yeah, drug discovery is not my speciality.
Joshua, thank you very much. And can you talk us through the main challenges you faced in integrating and harmonising proteomic, genomic and clinical data at such a scale? Particularly across the 400,000 participants and the 19 different cancer types?
Yeah. So, it was a massive challenge, as you can understand. And now that would have moved over to whole genome sequencing notes, it was this was sort of like the baby to transition into the whole genome sequencing. So there, there was a lot of lessons learned, but fortunately we work in a very, very highly skilled team and having that multi skilled team has really set the groundwork or set the foundation for this study. So, using high performance computing, etc, like that is sort of my background. So one of the main challenges was then running this in cloud technology that the UK Biobank provides, because obviously we can't download 28 terabytes of metabytes of data. It's just, unrealistic. So a lot of the big data analysis was a lot of the heavy compute was already done from pharma, which was very thankful because otherwise it would have just been impossible to do. I mean, in terms of defining the phenotype datas and all the exclusion etcetera. Like I said, this unit is very heavily skewed in that area, and they know the data inside out. So that was great.
Fantastic. Thank you very much, Joshua. So in terms of the data, can you elaborate on the statistical or the computational methods you used to ensure the robustness or reproducibility of the results? And maybe you could talk us through if there were any specific innovations in terms of how you generated or applied the exGS?
Yeah, yeah. So we have validated the exGS across multiple teleports. One of the things that hopefully will be out very soon is that we're able to look at proteins that just didn't have an observational association. So we were just looking for genetic predicted proteins with cross cancer risk. And we found that for prostate cancer Part 1, there's an increased risk in men that have a genetically predicted higher level of Part 1. And that has been replicated across multiple cohorts and in the first cohorts as well. So that sort of gives us confidence that the, the score's right. But the thing that we had to do to make the develop the scores was that we had to just do some trickery and reorientate some of the alleles so that it was always protein abundance increasing. And what this does is allows us to get around some of the limitations of bringing in rare variant analysis into polygenic risk models. So that was pretty tricky to begin with because you know, as you can imagine, a lot of data, a lot of people, there's a lot of back end work there that to make sure those, those instruments are independent, etc. So in terms of the protein associations, we just use the standard Cox model for the association with the polygenic risk. We just use the traditional logistic regressions adjusting for the standard things that you would just account for such as biological age of the person when they donated the sample, the sex and population structure. So yeah. And we've also replicated a lot of our findings in upper cohorts. So that gives us a bit more of a strength of the robustness of the study.
Fantastic, thank you very much, Joshua. And then leaning into the findings. So your study identified over 100 protein-cancer associations, some detectable years before diagnosis. Are you able to highlight one or two that are particularly promising from a mechanistic or translational perspective?
Yeah, of course. So, sticking with the theme of prostate cancer because of that's our lead focus. So I'll give you two examples of where the genetics is quite powerful. So the first is where we highlighted a gene called GP2 or protein, sorry. And from our transcriptional, I'm sorry, spatial transcriptomics and proteomics work, we know that GP 2 is very, very highly expressed within the lethal clone. So this is the clone that moves out of the prostate into the lymph node. And what it's doing is spewing out GP2 off everywhere. So this is why we're picking it up in the blood. It is prostate mostly specific. So this was a really, really cool finding, but we have zero genetic support for it. So not just the exome scores, but like looking at whole genome sequence and stuff, there's very, very little genetic support. So we don't, we don't feel like this GP2 is actually etiological. It's just like a bystander of the process that's happened.
So the second finding that I'd like to talk about is - we call it FLT3L. Now this is one of the essential genes that I mentioned before. It's very, very important in recruiting dendritic cells to the side of tumours, etc. And it helps also differentiate the precursor cells into dendritic cells. Now there's no evidence of nasty variation within that gene, of course, because it's essential, but the trans variation from those variants that make up that score are also risk variants for prostate cancer. And they're quite notorious genes like you check twos, ATMs, terts, which is a good sign, right? So we've replicated this finding in the European perspective analysis into nutrition and cancer, the EPIC cohort. And within that, we were able to dive into it a bit more and looking at the downstream effects of the, not necessarily the loss, but the impairment of FL3T effects another molecule called IL-15 and IL-15 is super important in the killer cell activation etcetera. So through bit of trickery, we found that this this it looks like the actual effect is in IL-15 itself. So it looks like men quite far out from developing prostate cancer, they have a drop in this FLT3 IL-15 pathways. So the immunosurveillance pathways which is causing the initiation of prostate cancer to develop. Now we've moved into animal models now trying to understand what the actual role is in IL-15. And we're in very, very early stages of investigating how IL-15 in particular stops that causes the right environment for the trimmers to develop. Yeah, sorry, is that that was two, right. So, to summarise, so we've with the triangulation of the genetics, we're starting to impede on the biology on the iron biology without genetic support. We're looking at like the chicken, not the egg.
Fantastic.Thank you. What insights did the study offer into the underlying biology of cancer development -particularly regarding immune signaling, inflammation, or the tumour microenvironment?
Yeah, yeah, yeah. So the study itself was like just an eye opener, like it pointed to things that we kind of knew about, but we didn't really put that much attention on. And now from that paper, like especially immuno surveillance pathways, we now have emphasised on, on that. And in particular, so MSNB the, the protein that I mentioned before that we're really, really into now with the whole genome sequencing data available, what we can look at is, is not just your standard like Snips and indoor variation. We've looked at copy number variation, structural variation, non-coding burden and we've all overlapped this. We've not just prostate cancer risk, but also the protein expression itself. So, what we've found from that is just new pathways of like say, for instance, MSNB that we've never been able to establish before. So we found that it's involved in the immune pathway. We found that it's involving viral APOBEC pathways. We've also found a role for in tumour suppression, specifically in the prostate. So, we're starting to see new biology or new, new clues into the biology that we just can't see with genetics or transcriptomics alone. So it's a really promising time moving forward and you know, it's just getting better. And like where we've done a study now in Epic as well where we're now looking at driver genes. So, we're able to pick up the remnants of very early driving mutations within the blood using the proteomics. So that just opens up a whole new area of research. So it is very exciting times. And like I said, like I'm probably not answering the question, but like we're starting to see new, new emerging themes, not just in prostate cancer, but across all cancers.
Thank you very much. And I think you've actually almost already answered the last question, which was - looking ahead, how might your proteogenomic framework be extended into other diseases or adapted for broader population screening initiatives? Is there anything else you you'd like to add on that?
Oh yeah, yeah, yeah. So, fundamentally it's understanding the disease better. So I know there's a lot of a lot of work. So, GP2, for instance, could be a really good early bio biomarker for prostate cancer risk. So, you know, people could make markers for that essentially and like do that and there's nothing wrong with that. But like, we're fundamentally wanting to understand why the cancer develops because understanding we can. Then work on prevention managers to do that. And I'm really big on prevention and our department, we have quite a famous person, Richard Pedo. And he did a lot of very interesting work back in the day where he noticed that the tar content, when they changed the manufacturing process of cigarettes here in the UK, when they dropped tar content, lung cancer risk went, or lung cancer cases, incidents went slightly down. So he could see this effect. And he, went and spoke to the Russian government, the Chinese government. And through those policies, they've saved millions of people's lives, you know, just by changing the manufacturing process. Like, I know this is a very poor example of this, but, like trying to find a drug that stops a cancer that's well progressed is, in my opinion, it's unfeasible. It's not, it's not going to happen. We need, we need to find ways to just not let it happen at all. And I believe a good start to that is the immune system from everything that we found through the protein experts so far. So watch this space. It's going to be very, very big in a couple of years where we where we can actually see what's really going on and hopefully stop cancer in his tracks. And I probably got a personal thing of that because I lost my sister last year to breast cancer at the age of 42. So like, you know, people shouldn't have to go through this, in this day and age. We should have the technology to stop this, like we should know what's causing this.
That's fantastic. Thank you so much. It’s really fantastic to have those insights that you've shared and also to hear the personal cause there as well. So thank you very much. And with that, yes, we will close today's interview. A really big thank you again to Joshua for your time today and for sharing insights into such significant work that you're doing within the Cancer Epidemiology Unit. And we're really delighted to be able to share that with our precision medicine community. So thank you so much again, Joshua, I wish you all the best of luck with your ongoing work and thank you very much.
Thank you for having me.
Thank you.
Related posts
No related thought leaderships
Upcoming events
Single Cell & Spatial Analysis in Tumour Microenvironment
Online
Join Oxford Global and Evan T. Keller to discuss single cell and spatial analysis in tumour microenvironment.
Translating AI-Enabled Multi-Omics Diagnostics into Clinical Practice: Regulatory, Reimbursement, and Adoption Pathways
Online
This session will explore how AI-enabled multi-omics diagnostics can be effectively translated into clinical practice.