News feed
Synthetic data in secure health research
2025. 12. 16.Researchers from BME participated in the SECURED project, in which scientists from 18 institutions in 9 European countries investigated how health data can be converted into synthetic data in a way that ensures its safe use in research.
The SECURED project, a European collaboration focused on the protection of health data and its ethical use through the application of synthetic data, is coming to an end. Researchers from 18 partners in 9 European countries, funded by the European Union, have been looking at how health data can be transformed so that it can be used safely for research. In Hungary, experts from the Budapest University of Technology and Economics and the Semmelweis University Health Management Training Center took part in the project.
| SECURED is an acronym for Scaling Up Secure Processing, Anonymization and Generation of Health Data for EU Cross-Border Collaborative Research and Innovation. Its main objective was to help improve healthcare services and enable secure research collaborations through data anonymization and the creation of synthetic data, thereby contributing to better patient care in the European Union. In addition to research institutes and hospitals in Western and Southern Europe, participants included the University of Amsterdam, the University of Cork, the University of Sassari, and the Catholic University of Leuven. The project is supported by the European Union's Horizon Europe research and innovation program under registration number 101095717. |
Synthetic data is data generated artificially by computer systems to look like real data, but doesn't include personal info like a patient's name or other details. This data, which can be image, text, or video-based, allows healthcare models to be developed and tested without compromising sensitive patient data. While digital tools such as electronic health records and telemedicine systems are transforming patient care, they also create new risks of data breaches and privacy violations.
"In the age of data-driven healthcare and artificial intelligence, synthetic data plays a key role in enabling development to take place in a realistic environment without data protection risks, thus bridging the gap between clinical application and research innovation. Synthetic data allows us to conduct research in a realistic but data protection-secure environment during the development of healthcare artificial intelligence, thereby accelerating the testing of algorithms and the path to clinical applicability," said Péter Pollner, a colleague at the Semmelweis University Health Management Training Center.
"Secure data sharing is not only a technological issue, but also a matter of building trust among healthcare stakeholders. In the SECURED project, we sought to find ways to build this trust.
Building trust starts where misunderstandings are most common: synthetic data alone does not guarantee complete data protection,"
warned Gergely Ács, a researcher at the Cryptography and System Security Laboratory at the Budapest University of Technology and Economics.
"When synthetic data is mentioned, many people think: 'It's not real, so it's not sensitive data.' However, this assumption is very dangerous. The challenge is that we have to balance two goals during development, which usually involves compromises: on the one hand, the data must be accurate and useful, retaining its original statistical properties, and on the other hand, it must be anonymous and GDPR-compliant. In other words, it should not be traceable to a physical person. Unfortunately, it is not possible to satisfy both of these goals without compromises," explained the researcher.
The crux of the problem is that generative models that produce synthetic data retain certain properties of the original data, which in turn can reveal sensitive information. "Synthetic data is actually a collection of aggregated information. If we can combine this aggregated information, we can reveal unique secrets," added Balázs Pejó, who is participating in the project as a postdoctoral researcher.
The source of the danger lies in the very principle of machine learning: "Generally, the model behaves differently on data it has already 'seen', as its predictions are more accurate on that data. By exploiting this memory, simple statistical tests can be used to determine which data the model predicts more accurately, making it likely that it has already seen that particular data.
This test indicates data leakage about individuals, highlighting that synthetic data alone does not guarantee confidentiality,"
said Gergely Ács.
The methodology developed during the SECURED project allows multiple organizations, such as different healthcare institutions, to securely share the results of their analyses with each other through their closed channels, without raw, sensitive data leaving their systems. This process creates new opportunities for both ethical healthcare research and the practical training of medical students. To achieve this, the project sought a solution not in synthetic data, but in the technology surrounding it. Secure application therefore requires additional technological safeguards, such as homomorphic encryption.
The results and methodology of the SECURED project will be openly available so that healthcare institutions and researchers can apply them more widely to ensure the secure and ethical handling of healthcare data.
Rector's Office, Communications Directorate
