Market Insight Multi-Omics Data Analysis & Integration & AI |

Mapping Gene Regulation with Graph Embedding of Single Cell Data

Luca Pinello

Luca Pinello

Associate Professor of Pathology

Harvard Medical School And Massachusetts General Hospital

1 July, 2025
Watch time: 6 Minutes

Highlights

Takeaways

Single cell technologies have unlocked the ability to profile molecular features, such as gene expression, chromatin accessibility, methylation, and protein surface markers, at the level of single cells. The challenge lies in making the most of the rich datasets that these technologies produce. 

In this presentation Luca Pinello outlines SIMBA, a method for building gene regulatory maps. Starting with the matrices of data, SIMBA organises the cells into a two-dimensional map, where cells with common features are positioned closer together. Crucially, SIMBA labels the areas with the relevant features on the map, much like metadata on Google Maps. The concept aims to elucidate the underlying principles of gene regulation. 

One method for analysing such data and constructing these maps is natural language processing (NLP). This technique enables computers to process and understand text and has broad applicability. In NLP, words are encoded as vectors in a latent space, where words with similar meanings are positioned closer together. The meaning of a word can then be inferred from its position relative to other words. For example, king + (woman – man) ≈ queen. 

Embeddings can also capture hierarchical structures. For example, Pinello described how words within a tweet can be embedded in relation to higher-level elements such as hashtags, forming a hierarchical representation. Subsequent research has expanded the capabilities of NLP tools: not only can they embed individual words and sentences, but also entire hierarchical graphs. This enables the analysis of relationships across multiple levels of language, including words, sentences, and full articles. 

Applied to biology, SIMBA can construct a hierarchical graph of the many factors which impact changes in gene expression. This graph connects genes and cells to secondary features like ATAC-seq peaks, motifs, and K-mers. After embedding the graph, researchers can explore the latent space and locate relevant cells with important features. 

By leveraging proximity in the embedded space, SIMBA facilitates the identification of key genes, transcriptional regulators, and regulatory regions associated with specific cell types.

PREMIUM CONTENT

Want that extra level of detail?
Subscribe and get access to full length presentations and write-ups giving you the insights to stay ahead of the curve.

Please note once we have received payment it can take 24 hours for your account to be updated to the premium access.