Motivated by liver research, this project examines the unexplored implications of relying on synthetic data in healthcare. Corresponding practices are motivated by the promise of enhanced patient privacy. While synthetic data has widely been assessed in other domains, its application (and implications) to medical studies remain uncertain. However, without a profound understanding of the implications the created synthetic data has on the validity of derived information, the knowledge gained from corresponding studies is unknown and thus highly questionable, potentially even dangerous.
This project tackles the aforementioned research gap by combining medical expertise and computer science. This interdisciplinary cooperation focuses on three primary research questions: (i) How does synthetic medical data differ from real-world data? (ii) Is there a difference in quality/utility between different generation approaches? (iii) Can researchers independently draw the same conclusion from both data sources (synthetic and real-world)? We are curious to see whether our research reveals any limitations in current practices related to using synthetic data in healthcare.
Collaboration and Funding: This research is an exploratory effort, part of a broader efforts at RWTH Aachen to enhance data-driven medicine. Supported by an interdisciplinary team and data from international biobanks, the project embodies the intersection of computational life sciences and clinical application.
![]() | ![]() |
SFFAIR003. Funded by the Excellence Strategy of the German federal and state governments.