Foundational Models for Heterogeneous Biological Image Data

Euro-BioImaging

This use case focuses on developing a foundation AI model specifically for heterogeneous biological imaging data. The model will enable more effective categorisation, search, and reuse of vast, diverse datasets stored in bioimaging archives, thereby enhancing data accessibility and value for users and RI operators.

Challenge

Biological imaging data from various experimental conditions, organisms, and modalities is growing rapidly. These diverse datasets require computational models to support organisation, categorisation, and accessibility. Foundation models trained on this mixed data can generalise well and enable a range of downstream applications, from similarity searches to measurements. This use case undertakes the large-scale training of such a foundational model.

Target

Develop and train a foundational AI model capable of generating high-quality embeddings from diverse biological image data to support categorisation, searchability, and other downstream tasks.

Development Steps

Select, curate, and standardise a large dataset covering multiple biological imaging modalities and experimental variables from the EMBL-EBI archive.
Create task-specific evaluation datasets through a combination of selective labelling and use of existing annotated data.
Fine-tune pre-existing natural image segmentation models on this biological data to create relevant benchmarks.
Train a new, large-scale biological imaging foundation model optimised for data discoverability, automated organisation, and reusable outputs for other scientific analyses.

Relevance / Target Stakeholders

Operators of Research Infrastructures (RIs) managing biological imaging archives
Researchers and users of imaging data requiring enhanced data discovery and analysis tools

Impact

The resulting model will significantly improve how RIs manage, organise, and offer access to their biological image archives. This will:

Enhance data discoverability and reuse
Increase the scientific value of existing archives
Reduce manual curation effort and support scalable data services in life sciences research

Involved Research Infrastructure + Partners

Euro-BioImaging ERIC

EMBL

KTH

Other Scientific Use Cases

Generative AI-Powered Assistant for Data Discovery and Analysis

Combining generative AI and image analysis tools, this assistant helps users navigate large bioimage archives...

Colorectal Cancer Risk Prediction with Explainable AI

Through AI analysis of over 45,000 whole-slide lymph node images, this use case aims to...

Foundational AI for Biological Imaging

Aiming to support categorization and search across large, mixed-image datasets, this use case trains a...

Synthetic Data for Computational Pathology

This use case generates synthetic medical images using diffusion models to overcome privacy issues in...

Smart Anomaly Detection in Climate Data Downloads

This use case builds AI models to detect abnormal download patterns from ESGF data repositories....

Space Debris & Anomaly Detection in Radar Data

Using AI to analyse radar data, this use case helps identify rare events, including space...

High-Resolution Climate Mapping for Agriculture

To help the agricultural and insurance sectors, this use case focuses on downscaling global climate...

AI-Supported Scheduling for Radar Experiments

To optimise radar usage, AI models learn from past experiments and real-time space weather data...

Technology Use Cases

Credit Management System

This use case develops a Credit Management System for Data Exploitation Platforms (DEPs) that tracks...

Green Computing Improvement

This use case evaluates the potential of GROQ’s high-performance, energy-efficient hardware to reduce the environmental...

Advanced image compression

This use case addresses the challenge of transferring and storing large medical image files in...

DEP Scalability on EuroHPC with DestinE

This use case tests the scalability and performance of RI-SCALE’s Data Exploitation Platform (DEP) on...