Market Insight Multi-Omics Data Analysis & Integration & AI |

Beyond Black Boxes With Pathway-Driven Models For Biomedical Data

Sophia Tsoka

Reader in Bioinformatics

King’s College London

1 July, 2025

Watch time: 6 Minutes

Highlights

Takeaways

Sophia Tsoka, Reader in Bioinformatics, King’s College London explained that it is important to adapt methodologies to address specific problems presented by specific data sets. The principles of machine learning provide good prediction performance but it is essential that this is provided in an interpretable manner so that scientists can trace decisions and that they are able to flexibly model and represent data and the problems they are encountering.

Tsoka argued that it isn’t the volume of data the problem - it’s the heterogeneity. Different data types, such as numerical and text data, all have distinct inherent properties. This means one must consider a variety of considerations when developing an appropriate data science methodology for a specific problem.

The ‘large p, small n problem’ is a common recurrence for computational science. To tackle this, the team resorted to feature selection methods, but the majority of these methods failed to represent relationships between features. For instance, we know that genes or proteins do not act in isolation. So, scientists must tweak their methods to address this.

So computational scientists are seeking to identify latent patterns. Tsoka uses mathematical optimisation to predict phenotypes with good accuracy in a way that scientists can create interpretable and explainable models. Tsoka suggested that mathematical optimisation models can infer gene weights that effectively separate sample phenotypes and express pathway activities. This aims to minimise misclassifications and improve prediction performance.

Regarding pathway activity interference, the methodology involves decomposing large data matrices into pathway-specific matrices and using optimisation procedures to derive pathway activities, which are then used for classification tasks. This approach reduces noise and improves the robustness of the data representation.

Tsoka tested her ML model on cardiovascular, breast, and colorectal cancer datasets. The model displayed a strong performance and performed better in terms of accuracy and other standard performance metrics compared to other methods. Tsoka selected three distinct metrics: multiclass classification accuracy, robustness to noise, and survival.

She stressed that her model is interpretable which enabled her to trace the entire modelling, understand how decisions are made, As a result she was then able to adjust analytical capabilities to the extent that the model allowed. Visualisations show improved sample separability using pathway-based representations.

Finally, Tsoka touched on ongoing work to integrate these concepts into neural network architectures, aiming to improve accuracy and reduce the number of parameters by using pathway-based autoencoders. The team is also extending their methods to single-cell RNAseq data for cell-type annotation.

Beyond Black Boxes With Pathway-Driven Models For Biomedical Data

Highlights

Takeaways

Watch this next

Using Omics and AI to Advance Target Discovery

Mapping Gene Regulation with Graph Embedding of Single Cell Data

cATACpipe: Making Sense of scATAC-seq Data with Computational Tools

Innovations in dPCR Partition Classification

The iCAN Project: Upscaling Laboratory ProcessesTo Match Big Data Requirements

Beyond Black Boxes With Pathway-Driven Models For Biomedical Data

Highlights

Takeaways

Watch this next

Using Omics and AI to Advance Target Discovery

Mapping Gene Regulation with Graph Embedding of Single Cell Data

cATACpipe: Making Sense of scATAC-seq Data with Computational Tools

Innovations in dPCR Partition Classification

The iCAN Project: Upscaling Laboratory ProcessesTo Match Big Data Requirements

Register for free to view

News

New Nanoneedle Technology Could Replace Painful Biopsies in Cancer Care

NHS Plans Nationwide Newborn DNA Sequencing to Assess Disease Risk

You're just a click away