This use case develops diffusion models to generate synthetic whole slide images (WSIs) for colorectal cancer (CRC), enabling privacy-compliant data sharing and foundational representation learning for digital pathology. It will evaluate the use of synthetic WSIs for AI training and diagnostic support.
Synthetic Data for Computational Pathology

Challenge
Privacy constraints and lack of patient consent hinder the sharing of real histopathological whole slide images (WSIs). Synthetic WSIs generated via AI can overcome these limitations, allowing dataset distribution and enabling the training of AI models. Generating such images—especially at extremely high resolutions (up to 100,000 × 100,000 pixels)—is computationally demanding, yet essential for developing semantically rich representations for digital pathology, particularly colorectal cancer (CRC).
Target
Create and evaluate diffusion-based synthetic WSI generation for colorectal cancer pathology, enabling sharing, privacy compliance, and model generalisation.
Development Steps
-
Train diffusion models to generate high-resolution WSIs for CRC and related organs
-
Condition the generation based on specific cases and diagnostic metadata
-
Perform expert evaluation of the realism and diagnostic utility of the synthetic images
-
Compare model performance trained on real versus synthetic data
-
Investigate privacy leakage risks in generated synthetic datasets
Target Stakeholders
-
RI operators and data custodians
-
AI researchers in medical imaging
-
Developers of privacy-preserving data sharing solutions
Impact
-
Enables open access to high-quality synthetic CRC datasets for research
-
Offers a reproducible, privacy-preserving method to share histopathological data
-
Establishes a foundation for future foundation models in digital pathology
-
Helps reduce dependency on sensitive clinical datasets while supporting AI innovation