The spatial biology revolution is upon us, with more tools than ever being developed for proteomic and transcriptomic biomarker discovery, it is essential to use them effectively to integrate different data types, giving scientists a holistic understanding of the biological processes in complicated diseases.  

Scott Hoffmann, Associate Director at AstraZeneca, explained that existing multiomics technologies enable researchers to gather all types of information at a spatial and non-spatial level across different omics layers. The data collection is not the issue, rather, the interpretability of the data is the main difficulty. With multiple transcripts interacting with multiple proteins and proteins interacting with the same metabolite, how can one make sense of this? 

Hoffmann proposed that there are two key integration types: horizontal data integration and vertical data integration. Horizontal integration examines the same key features across different sample sets, different patients, and different cells. Vertical integration looks at different features in the same sample sets that are analysed across different omics platforms to understand the interactions between genes, proteins, and metabolites. 

Hoffmann focused on the vertical approach. His group at AstraZeneca works with spatially single omics platforms and translates them to multiplexed imaging and multimodal imaging platforms. He uses spatial biology and advanced imaging technologies to contextualize biological data within tissues, working closely with pathologists to validate findings. He explained that his main aim is to achieve multimodal imaging by connecting data from two or more imaging modalities to detect new biomarkers and contextualize the whole output spatially. 

Hoffmann presented a case study on integrating transcriptomics and proteomics data from treated macrophage cells. The main task was assessing changes at the transcriptomic level to understand how macrophages respond to treatment and assess the mechanism of action. The macrophages were incubated for six days with three different treatments: a vehicle, a non-targeting control, and an on-target treatment. 

To gather the proteomic data, mass spectrometry (MS) identified differentially expressed proteins across the three treatments. The GeoMx DSP system performed whole transcriptome analysis on cell pellets. Ten regions of interest per cell pellet were analysed to generate transcriptomic data. Then, principal component analysis (PCA) and clustering analysis showed a clear separation between the on-target treatment and the controls. 

From the 8,500 molecules analysed, 4,270 had matching proteins and transcripts. Around 30% of molecules were differentially expressed in proteomic and transcriptomic data. This was complemented by Fisher’s method which helped identify a consensus between the two data sets. 

Across proteomic and transcriptomic data sets CD28 was downregulated. However, there were some conflicting findings, for instance, complement factor C3 protein was upregulated in the transcriptomic data but downregulated in the proteomic data. Therefore, further investigation is needed to uncover why this is the case. In summary, the study aims to add spatial complexity to the data integration process by incorporating spatially resolved omics data.