Zifo Technologies acts as a guiding partner during drug discovery, helping researchers navigate the challenges of handling and analysing large datasets. By providing support akin to a Sherpa, they streamline biological exploration and data management.
Keeping Results Alive: Storing, Managing, and Visualising Data
With the advancement of single cell and spatial techniques, not only is the technology getting more complex, but the datasets are too. While bulk RNA seq outputted its data in a simple CSV or TSV format, single cell transcriptomic data has more layers, output as multidimensional matrices. Furthermore, spatial information can include imagery, making the data multimodal.
Therefore, in search for the optimal platform for spatial data storage, Zifo Technologies partnered with multiple vendors including TileDB and Parquet. The platforms were benchmarked on hundreds of datasets ranging from 70MB to 20GB, from a maximum of around seven million cells. It was important to ensure that all the necessary metadata was intact to allow for querying of the data.
Both of these systems had their respective advantages and disadvantages: ingestion took longer for TileDB on larger datasets however, that platform had lower peak memory allocation. Their finding suggested that TileDB was more effective for managing transcriptomic single-cell data.
Furthermore, effective data visualisation is crucial for scientists to gain insights, prompting the development of interactive platforms for various user needs. Shanmugam outlined the development of a data analysis and visualisation dashboard which would allow their client to see bulk and single cell data in useful and digestible breakdowns.
Making Expression Meaningful: Multi-Omics Integration Using scGPT
Next, Agni Sinha introduced methods to integrate multi-omics data using large language models, highlighting their potential to enhance understanding and identify relationships in complex datasets. He walked us through the integration of machine learning, particularly large language models, in accelerating data integration and analysis in bioinformatics.
Shared insights from case studies involving public datasets, he demonstrated the effectiveness of their approaches in real-world scenarios. The presentation concluded with optimism about the future of bioinformatics, while suggesting the need for scalable architectures and innovative models to manage and analyse vast amounts of data.