Benevolent AI has constructed a drug discovery platform unfused with machine learning that covers a broad range of biomedical data. Gabriel Rosser, Lead Bioinformatics Data Scientist at Benevolent AI, homed in on the broad data foundation. Benevolent AI’s platform is built on a large, harmonised data foundation, sometimes referred to as a knowledge graph, integrating various data types at scale.  

Data processing pipelines incorporate a sub-component around natural language processing. So, they can identify genes, tissues, diseases, and relationship extraction capabilities that can uncover relationships between these.  Benevolent AI’s data foundation integrates data types from single-cell RNA-seq, GWAS, and omics data and unifies it into a knowledge graph.  

Rosser said that scientists understand disease states through mechanistic biology. The general hypothesis is that researchers uncover indirect evidence linking targets to disease and mechanisms in the context of tissue cell types because common mechanistic drivers are involved in many disease types. So, coupling this information with machine learning enables one to pull out these indirect links.  

Bulk and single-cell data was combined into a molecular signature database and was enriched with contexts such as GWAS hits and mechanistic pathways to reveal mechanistic biology and stratify patient subtypes. Rosser added that during COVID-19, a rheumatoid arthritis drug baricitinib was identified for the treatment of a very unrelated disease, which was identified through stack data. 

To show this in action, an internal project on systemic sclerosis found a collagen biosynthesis–enriched module linked to activated myofibroblasts. Rosser commented that there was an overlap with GWAS hits and that there were common GWAS hits in the leading genes of the co-expression module. Patients were stratified into E1 and E2 subtypes that represented disease severity. Then, UMAPs were used to plot bulk and single-cell data.