Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/66616, first published .
Data Interoperability in Context: The Importance of Open-Source Implementations When Choosing Open Standards

Data Interoperability in Context: The Importance of Open-Source Implementations When Choosing Open Standards

Data Interoperability in Context: The Importance of Open-Source Implementations When Choosing Open Standards

Viewpoint

1Eindhoven AI Systems Institute (EAISI), Eindhoven University of Technology, Eindhoven, The Netherlands

2PharmAccess Foundation, Amsterdam, The Netherlands

3Dutch Hospital Data, Utrecht, The Netherlands

4MAASTRO Clinic, Maastricht University Medical Centre, Maastricht University, Maastricht, The Netherlands

5Netherlands Comprehensive Cancer Organisation, Utrecht, The Netherlands

6Expertisecentrum Zorgalgoritmen, Utrecht, The Netherlands

7Ona, Burlington, VT, United States

Corresponding Author:

Daniel Kapitan, DPhil

Eindhoven AI Systems Institute (EAISI)

Eindhoven University of Technology

PO Box 513

Eindhoven, 5600 MB

The Netherlands

Phone: 31 624097295

Email: daniel@kapitan.net


Following the proposal by Tsafnat et al (2024) to converge on three open health data standards, this viewpoint offers a critical reflection on their proposed alignment of openEHR, Fast Health Interoperability Resources (FHIR), and Observational Medical Outcomes Partnership (OMOP) as default data standards for clinical care and administration, data exchange, and longitudinal analysis, respectively. We argue that open standards are a necessary but not sufficient condition to achieve health data interoperability. The ecosystem of open-source software needs to be considered when choosing an appropriate standard for a given context. We discuss two specific contexts, namely standardization of (1) health data for federated learning, and (2) health data sharing in low- and middle-income countries. Specific design principles, practical considerations, and implementation choices for these two contexts are described, based on ongoing work in both areas. In the case of federated learning, we observe convergence toward OMOP and FHIR, where the two standards can effectively be used side-by-side given the availability of mediators between the two. In the case of health information exchanges in low and middle-income countries, we see a strong convergence toward FHIR as the primary standard. We propose practical guidelines for context-specific adaptation of open health data standards.

J Med Internet Res 2025;27:e66616

doi:10.2196/66616

Keywords



“A paradox of health care interoperability is the existence of a large number of standards with significant overlap among them,” says Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] followed by a call to action toward the health informatics community to put effort into establishing convergence and preventing collision. To do so, they propose to converge on three open standards, namely (1) openEHR for clinical care and administration, (2) Fast Health Interoperability Resources (FHIR) for data exchange, and (3) Observational Medical Outcomes Partnership Common Data Model (OMOP) for longitudinal analysis. They argue that open data standards, backed by engaged communities, hold an advantage over proprietary ones, and therefore, should be chosen as the stepping stones toward achieving true interoperability.

While we support their high-level rationale and intention, we feel their proposed trichotomy does not do justice to details that are crucial in real-world implementations. This viewpoint provides a critical reflection on their proposed framework in three parts. First, we reflect on salient differences between the three open standards from the perspective of the notion of openness of digital platforms [de Reuver M, Sørensen C, Basole RC. The digital platform: a research agenda. J Inf Technol. 2018;33(2):124-135. [CrossRef]2], the paradox of open [Keller P, Tarkowski A. The paradox of open. Open Future. 2021. URL: https://openfuture.pubpub.org/pub/paradox-of-open/release/1 [accessed 2024-03-25] 3], and the hourglass model of open architectures [Estrin D, Sim I. Health care delivery. Open mHealth architecture: an engine for health care innovation. Science. 2010;330(6005):759-760. [CrossRef] [Medline]4,Beck M. On the hourglass model. Commun ACM. 2019;62(7):48-57. [CrossRef]5]. Subsequently, we outline the importance of open-source software (OSS) by reflecting on our considerations in designing and implementing health data platforms in two specific contexts, namely (1) platforms for federated learning (FL) on shared health data in high-income countries and (2) health data platforms for low- and middle-income countries (LMICs). These case studies illustrate the limitations of the trichotomy proposed by Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1]. Particularly, we argue that of the 3 standards, FHIR stands out as being the most practical and adaptable which allows it to be used for longitudinal analysis and routine collection of clinical data, besides its original purpose as a health data exchange standard. We conclude this viewpoint with practical implications of these findings and directions for future research of open health data standards.


In their editorial, Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] argue that (1) the paradox of interoperability of having overlapping standards can be addressed by converging on just three standards; (2) practical and sociotechnical considerations are as important as, if not more important than, technical superiority and therefore balancing of customizability and rigidity is of the essence; and (3) open standards, backed by engaged communities, hold an advantage over proprietary ones. While we concur with these points, we argue that these are necessary, but not sufficient conditions for convergence of health data standards. Existing research on digital platforms underlines the importance of the platform’s openness, not only in terms of open standards but also in terms of the availability of executable pieces of software, extensibility of the code base, and availability of complements to the core technical platform (in this case, the health data standard is a critical, defining component of the core technical platform) [de Reuver M, Sørensen C, Basole RC. The digital platform: a research agenda. J Inf Technol. 2018;33(2):124-135. [CrossRef]2]. Openness in this context pertains to the software modules that constitute the digital platform (Textbox 1). Realizing openness can be achieved through open-sourcing the core components of the platform or defining standardized interfaces through which components can interact [de RM, Ofe H, Agahari W, Abbas A, Zuiderwijk A. The openness of data platforms: a research agenda. ACM; 2022. Presented at: CoNEXT '22: The 18th International Conference on emerging Networking EXperiments and Technologies; December 9, 2022:34-41; Rome Italy. [CrossRef]6]. Only when the majority of these aspects of digital platforms are met can we reasonably expect that the digital platform will indeed flourish and be long-lived.

If open digital platforms are what we want, the question is how to achieve that. In what they frame as “the paradox of open,” Keller and Tarkowski [Keller P, Tarkowski A. The paradox of open. Open Future. 2021. URL: https://openfuture.pubpub.org/pub/paradox-of-open/release/1 [accessed 2024-03-25] 3] argue that open platforms and their associated ecosystems can only flourish if two types of conditions are met. The first condition states that many people need to contribute to the creation of a common resource. “This is the story of Wikipedia, OpenStreetMap, Blender.org, and the countless free software projects that provide much of the internet’s infrastructure” [Keller P, Tarkowski A. The paradox of open. Open Future. 2021. URL: https://openfuture.pubpub.org/pub/paradox-of-open/release/1 [accessed 2024-03-25] 3]. Indeed, Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] have explicitly taken into account that “an engaged and vibrant community is a major advantage for the longevity of the data standards it uses,” which has informed their proposal to converge toward OMOP, FHIR, and openEHR over other existing health data standards. However, the importance of OSS is somewhat overlooked. This point is only mentioned in passing when Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] reference work done by Reynolds and Wyatt [Reynolds CJ, Wyatt JC. Open source, open standards, and health care information systems. J Med Internet Res. 2011;13(1):e24. [FREE Full text] [CrossRef] [Medline]7] who already argued in 2011 “… for the superiority of open-source licensing to promote safer, more effective health care information systems. We claim that open-source licensing in health care information systems is essential to rational procurement strategy.” Hence, we extend the line of reasoning of Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] by emphasizing that the availability of executable OSS components, which inherently makes it easier to extend the code base of the health data standard and thereby drive greater availability of complementary components, is an important criterion which needs to be explicitly taken into account when choosing which standard to adopt.

The second condition put forward by Keller and Tarkowski [Keller P, Tarkowski A. The paradox of open. Open Future. 2021. URL: https://openfuture.pubpub.org/pub/paradox-of-open/release/1 [accessed 2024-03-25] 3] is that open ecosystems have proven fruitful when “opening up” is the result of external incentives or requirements, rather than voluntary actions. Examples of such external incentives are “...publicly funded knowledge production like Open Access academic publications, cultural heritage collections in the Public Domain, Open Educational Resources, and Open Government data.” Another canonical example is the birth of the Global System for Mobile Communications standard, which was mandated by European legislation [GSM. Wikipedia. 2024. URL: https://en.wikipedia.org/w/index.php?title=GSM&oldid=1245675274 [accessed 2024-09-20] 8]. Reflecting on this condition in the context of open health data ecosystems, we observe a salient difference between FHIR versus openEHR and OMOP, namely that the former is the only one that has been mandated—or at least strongly recommended—in some jurisdictions. Survey results on the state of FHIR show that the FHIR standard has been mandated or advised in 20 countries [The state of FHIR 2024 survey results. HL7. URL: https://www.hl7.org/documentcenter/public/white-papers/2024%20StateofFHIRSurveyResults_final.pdf [accessed 2024-04-04] 9]. Notably, the European Electronic Health Record Exchange Format, introduced by the European Commission in 2019 with the aim to ensure secure, interoperable, cross-border access to electronic health data across the European Union, decided in 2022 to adopt Health Level 7 FHIR as the exchange format for future priority data categories [Carmo A, Martins H. D2.2—EHRxF in a Nutshell-WP2-ISCTE. 2024. URL: https:/​/ehr-exchange-format.​eu/​wp-content/​uploads/​2024/​10/​D2.​2-v20240704-EHRxF-in-a-Nutshell-WP2-ISCTE.​pdf [accessed 2025-03-28] 10]. In the United States, the Office of the National Coordinator for Health Information Technology and the Centers for Medicare and Medicaid Services have introduced a steady stream of new regulations, criteria, and deadlines in Health IT that has resulted in significant adoption of FHIR [FHIR in US healthcare regulations. Firely. 2023. URL: https://simplifier.net/organization/firely/news/153 [accessed 2024-05-30] 11]. In India, the open Health Claims Exchange protocol specification—which is based on FHIR—has been mandated by the Indian government as the standard for e-claims handling [National digital health mission. India National Health Authority. 2020. URL: https://www.niti.gov.in/sites/default/files/2023-02/ndhm_strategy_overview.pdf [accessed 2025-03-28] 12,HCX Protocol V0.9. 2023. URL: http://hcxprotocol.io/ [accessed 2024-09-18] 13]. The African Union recommends all new implementations and digital health system improvements use FHIR as the primary mechanism for data exchange [Tilahun B, Mamuye A, Yilma T, Shehata Y. African Union health information exchange guidelines and standards. 2023. URL: https://africacdc.org/download/african-union-health-information-exchange-guidelines-and-standards/ [accessed 2025-03-28] 14], but does not say anything about the use of, for example, openEHR for clinical point-of-service systems.

Our third critical reflection on choosing health data standards pertains to the notion of the hourglass model [Estrin D, Sim I. Health care delivery. Open mHealth architecture: an engine for health care innovation. Science. 2010;330(6005):759-760. [CrossRef] [Medline]4,Beck M. On the hourglass model. Commun ACM. 2019;62(7):48-57. [CrossRef]5] and the concept of open architectures [Mehl GL, Seneviratne MG, Berg ML. A full-STAC remedy for global digital health transformation: open standards, technologies, architectures and content. Oxford Open Digital Health. 2023;1:oqae048. [CrossRef]15]. The hourglass model is “...an approach to design that seeks to support a great diversity of applications (at the top of the hourglass) and allow implementation using a great diversity of supporting services (at the bottom)” [Beck M. On the hourglass model. Commun ACM. 2019;62(7):48-57. [CrossRef]5]. The center of the hourglass—the waist, also called the spanning layer in information systems parlance—is defined by a set of minimal standards that mediates all interactions between the higher and lower layers. In the case of the internet, the spanning layer is defined by the transmission control protocol/internet protocol, which is supported by a variety of underlying connectivity services (many different physical networks) on top of which many different applications can be built (email, videoconferencing, etc). We argue that FHIR has an added benefit over openEHR and OMOP because it can act as the spanning layer within an open health data platform. Because FHIR is inherently designed to function as a data exchange standard, it can function as a mediator between different components of the health data platform. The modularity of the various components that are part of the FHIR ecosystem allows it to be used effectively to implement subsystems, including data pipelines and data processing engines (Textbox 2).

Textbox 1. Conceptual background of the digital platform.

Digital platforms are software-based digital infrastructures that facilitate interactions and transactions between users. In the context of this paper, digital platforms serve as an interface used to interact with data systems. Data systems describe a set of technologies, tools, and processes that extract, manage, and deliver data. Where the data system describes the functional implementation, the data architecture specifies the design framework, outlining how the data flows in its collection, storage, processing, and governance. Its key components are data sources (original “raw” data that is collected before any processing), data repositories like databases, data warehouses, or lakes, and data processing engines and pipelines that transform raw data into a usable format for analysis.

All architectures include a core technical platform (the foundational infrastructure) that can be extended to facilitate the necessary digital services. Data architectures contain different levels of specifications for the technical components entailed in the system. These levels include a systems code base (machine-readable text describing how to extract and process certain data), software tools (programs and applications enabling digital operations), and stacks (layers of software systems working together).

Textbox 2. Conceptual background of data processing pipelines for analytics.

Data pipelines define a sequence or workflow of processes for data. Data processing engines are tools that process, transform, and analyze large-scale data and thereby provide the foundational infrastructure to implement data pipelines. Computing workloads are specific tasks executed across data systems, like data processing and analytics.

Data transformation entails all the processing pipelines that convert data into usable insights. Mappings are specific data transformations that aim to align data from different sources with a unified structure. Granular mappings transform data at the most detailed level, translating data elements across different schemas. Queries are built on top of transformed data, and retrieve data for insights generation, sometimes requiring further data processing.

We argue that (1) the external incentives that have mandated FHIR in certain jurisdictions and (2) the inherent modularity of the FHIR standard have resulted in a large boost in both commercial and OSS development activities in the FHIR ecosystem. Illustrative of this is the speed with which the Bulk FHIR API has been defined and implemented in almost all major implementations [Mandl KD, Gottlieb D, Mandel JC, Ignatov V, Sayeed R, Grieve G, et al. Push button population health: the SMART/HL7 FHIR bulk data access application programming interface. NPJ Digital Med. 2020;3(1):151. [FREE Full text] [CrossRef] [Medline]16,Jones J, Gottlieb D, Mandel J, Ignatov V, Ellis A, Kubick W, et al. A landscape survey of planned SMART/HL7 bulk FHIR data access API implementations and tools. J Am Med Inform Assoc. 2021;28(6):1284-1287. [FREE Full text] [CrossRef] [Medline]17], and the SQL-on-FHIR specification to make large-scale analysis of FHIR data both accessible to a larger audience, as well as portable between systems [SQL on FHIR V2.0.0-Pre. URL: https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/ [accessed 2024-09-20] 18].

The external incentives have also led to more people voluntarily contributing to FHIR-related OSS projects, which has resulted in a wide offering of FHIR components across major technology stacks (Java, Python, .NET), thereby strengthening the first condition for establishing openness. By comparison, OMOP and openEHR have profited less from external incentives to spur the adoption and thereby grow the ecosystem beyond a certain critical mass. To illustrate this, a quick scan of the available OSS components listed on the website of the three governing bodies, Health Level 7 [FHIR Open Source Implementations. URL: https://confluence.hl7.org/display/FHIR/Open+Source+Implementations [accessed 2024-09-20] 19], Observational Health Data Sciences and Informatics [Software tools. OHDSI. URL: https://www.ohdsi.org/software-tools/ [accessed 2024-09-20] 20], and openEHR [Beale SHT. OpenEHR platform. URL: https://openehr.org/products_tools/platform/ [accessed 2024-09-20] 21], indicates that the ecosystem of FHIR and OMOP have a significantly larger offering of extensible and complementary OSS components than openEHR, although for the latter, a notable mature OSS implementation is available with EHRbase [EHRbase 2.0 Website. 2024. URL: https://www.ehrbase.org/ [accessed 2024-09-20] 22]. Taking GitHub as a proxy of worldwide development activities, Table 1 shows the number of contributors and repositories for three different search terms. Given that the FHIR standard has more application areas, one would expect more GitHub projects than openEHR and hence these numbers should only be taken as rough indicators.

In summary, we stress that beyond evaluating the intrinsic structure of an open standard and the community that supports the standard, we need to consider the wider ecosystem of OSS implementations and the availability of complementary components. From this wider perspective of the ecosystem surrounding the three standards, FHIR stands out as having the most diverse and rich ecosystem because it has been mandated in certain jurisdictions and because its technical foundations are inherently broader and more modular. This is relevant when comparing these standards in real-world implementations. We now turn to two specific use cases where these considerations are at play.

Table 1. Number of contributors and number of repositories on GitHub for the three health care data standards as of January 28, 2025.
Period and search termContributors, nRepositories, n
Last three months

“openEHR”8249

“OMOP” or “OHDSI”446221

“FHIR”1648756
All time

“openEHR”429450

“OMOP” or “OHDSI”1019113

“FHIR”84978617

The current fragmentation of health data is one of the major barriers to leveraging the potential medical data for machine learning (ML). Without access to sufficient data, ML will be limited in its application to health improvement efforts, and ultimately, from making the transition from research to clinical practice. High-quality health data, obtained from a research setting or a real-world clinical practice setting, is hard to obtain because health data is tightly regulated.

FL is a learning paradigm that aims to address these issues of data governance and privacy by training algorithms collaboratively without moving (copying) the data itself (Textbox 3) [Rieke N, Hancox J, Li W, Milletarì F, Roth HR, Albarqouni S, et al. The future of digital health with federated learning. NPJ Digital Med. 2020;3:119. [FREE Full text] [CrossRef] [Medline]23,Teo ZL, Jin L, Li S, Miao D, Zhang X, Ng WY, et al. Federated machine learning in healthcare: a systematic review on clinical applications and technical architecture. Cell Rep Med. 2024;5(2):101419. [FREE Full text] [CrossRef] [Medline]24]. Based on ongoing work with the PLUGIN health care consortium [Platform voor uitwisseling en hergebruik van klinische data nederland. PLUGIN. URL: https://plugin.healthcare/ [accessed 2024-03-14] 25], we have detailed an architecture for FL for the secondary use of health data for hospitals in the Netherlands. The starting point for this implementation is the National Health Data Infrastructure agreements for research, policy, and innovation for the Dutch health care sector, which were adopted at the beginning of 2024 [Agreements on the national health data infrastructure for research, policy and innovation - health-RI nationale gezondheidsdata-infrastructuur - confluence. Health-RI. 2024. URL: https:/​/health-ri.​atlassian.net/​wiki/​spaces/​HNG/​pages/​997466565/​Health+data+infrastructure+for+research+policy+and+innovation [accessed 2024-06-03] 26]. Figure 1 shows a high-level reference architecture of the infrastructure to be, comprising three areas (multiple use, applications, and generic functions and a total of 26 functional components [Agreements on the national health data infrastructure for research, policy and innovation - health-RI nationale gezondheidsdata-infrastructuur - confluence. Health-RI. 2024. URL: https:/​/health-ri.​atlassian.net/​wiki/​spaces/​HNG/​pages/​997466565/​Health+data+infrastructure+for+research+policy+and+innovation [accessed 2024-06-03] 26]. One of the prerequisites of this architecture is that organizations that participate in a federation of “data stations” use the same common data model to make the data Findable, Accessible, Interoperable, and Reusable (FAIR). These FAIR data stations comprise components 7, 8, and 9 in Figure 1, that is, the data, metadata, and APIs, respectively, through which the data station can be accessed and used.

Following the line of reasoning of Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1], OMOP would be the go-to standard for storing the longitudinal data in each of the data stations, where data are transformed from the original source (component 6), stored using a common data model (component 7) and properly annotated with metadata (component 8). Indeed, by now, there are quite a real-world implementations of FL networks based on the Observational Health Data Sciences and Informatics-OMOP stack, including a global infrastructure with 22 centers for COVID-19 prediction models [Khalid S, Yang C, Blacketer C, Duarte-Salles T, Fernández-Bertolín S, Kim C, et al. A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data. Comput Methods Programs Biomed. 2021;211:106394. [FREE Full text] [CrossRef] [Medline]27], FeederNet in South Korea with 57 participating hospitals [Lee S, Kim C, Chang J, Park RW. FeederNet (Federated E-Health Big Data for Evidence Renovation Network) platform in Korea. OHSDI. URL: https://www.ohdsi.org/2022showcase-33/ [accessed 2024-06-04] 28], Dutch multicohort dementia research with 9 centers [Mateus P, Moonen J, Beran M, Jaarsma E, van der Landen SM, Heuvelink J, et al. Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: a Netherlands consortium of dementia cohorts case study. J Biomed Inform. 2024;155:104661. [FREE Full text] [CrossRef] [Medline]29], the European severe heterogeneous asthma research collaboration [Kroes JA, Bansal AT, Berret E, Christian N, Kremer A, Alloni A, et al. Blueprint for harmonising unstandardised disease registries to allow federated data analysis: prepare for the future. ERJ Open Res. 2022;8(4):00168-02022. [FREE Full text] [CrossRef] [Medline]30], and the recently initiated Belgian Federated Health Innovation Network [Deltomme C, Denturck K, De JP. Federated Health Innovation Network (FHIN). URL: https:/​/www.​ohdsi-europe.org/​images/​symposium-2024/​Posters/​poster%20OHDSI%20FHIN%20Camille%20Deltomme%20-%20Camille%20Deltomme.​pdf [accessed 2024-09-20] 31].

For the PLUGIN project, however, we choose to adopt FHIR as a data model because it is more compatible with the data model of the clinical administration systems. As PLUGIN focuses on the secondary use of routine health data, we feel it is more suitable than OMOP, the latter being more suitable for clinical research data. OpenEHR might have been an option, too, if more implementations and complementary components had been available. Another reason for choosing FHIR is its practicality and extensibility to be used in a Python-based data science stack, provenance of RESTful APIs out-of-the-box to facilitate easy integration with the container-based vantage6 FL framework, and the support of many health care terminologies and flexibility through its profiling mechanism [Moncada-Torres A, Martin F, Sieswerda M, Van Soest J, Geleijnse G. VANTAGE6: an open source priVAcy preserviNg federaTed leArninG infrastructurE for secure insight eXchange. AMIA Annu Symp Proc. 2020;2020:870-877. [FREE Full text] [Medline]32-Smits D, Van Beusekom B, Martin F, Veen L, Geleijnse G, Moncada-Torres A. An improved infrastructure for privacy-preserving analysis of patient data. Stud Health Technol Inform. 2022;295:144-147. [CrossRef] [Medline]34]. Increasingly, other projects have reported the use of FHIR for persistent, longitudinal storage for FL. A scoping review on the use of FHIR for clinical research shows that it is increasingly being used for data preparation, cohort selection, and secondary data sharing [Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc. 2022;29(9):1642-1653. [FREE Full text] [CrossRef] [Medline]35]. The CODA platform, which aims to implement an FL infrastructure in Canada similar to the PLUGIN project, compared OMOP and FHIR and chose the latter as it has been found to support more granular mappings required for analytics [Mullie L, Afilalo J, Archambault P, Bouchakri R, Brown K, Buckeridge DL, et al. CODA: an open-source platform for federated analysis and machine learning on distributed healthcare data. J Am Med Inform Assoc. 2024;31(3):651-665. [FREE Full text] [CrossRef] [Medline]36]. The fair4health project used FHIR as part of a FAIRification workflow to simplify the process of data extraction and preparation for clinical study analyses [Sinaci AA, Gencturk M, Alvarez-Romero C, Laleci Erturkmen GB, Martinez-Garcia A, Escalona-Cuaresma MJ, et al. Privacy-preserving federated machine learning on FAIR health data: a real-world application. Comput Struct Biotechnol J. 2024;24:136-145. [FREE Full text] [CrossRef] [Medline]37].

Given that OMOP can, conceptually, be viewed as a strict subset of FHIR, hybrid solutions using a combination of OMOP and FHIR have also been reported, such as the German KETOS platform [Gruendner J, Schwachhofer T, Sippl P, Wolf N, Erpenbeck M, Gulden C, et al. KETOS: Clinical decision support and machine learning as a service—a training and deployment platform based on docker, OMOP-CDM, and FHIR web services. PLoS One. 2019;14(10):e0223010. [FREE Full text] [CrossRef] [Medline]38], and the preliminary findings from the European GenoMed4All project, which aims to connect clinical and -omics data [Cremonesi F, Planat V, Kalokyri V, Kondylakis H, Sanavia T, Resinas VMM, et al. The need for multimodal health data modeling: a practical approach for a federated-learning healthcare platform. J Biomed Inform. 2023;141:104338. [FREE Full text] [CrossRef] [Medline]39]. A collaboration of 10 university hospitals in Germany has shown that standardized ETL-processing from FHIR into OMOP can achieve 99% conformance [Peng Y, Henke E, Reinecke I, Zoch M, Sedlmayr M, Bathelt F. An ETL-process design for data harmonization to participate in international research with German real-world data based on FHIR and OMOP CDM. Int J Med Inform. 2023;169:104925. [FREE Full text] [CrossRef] [Medline]40], which confirms the feasibility of the solution pattern where FHIR acts as an intermediate sharing standard through which data from (legacy) systems are extracted and made available for reuse in a common data model. One could argue that the distinction between FHIR and OMOP becomes less relevant if data can be effectively stored in either standard. We are hopeful that initiatives like OMOP-on-FHIR indeed will foster convergence rather than a collision between these two standards [OMOPonFHIR. URL: https://omoponfhir.org/ [accessed 2024-09-20] 41].

Textbox 3. Conceptual background of distributed data systems.

Data systems often have a centralized architecture, where data are collected in a single repository or location. However, data systems can also distribute the storage and processing of data across different nodes or locations such as servers and edge devices.

Servers act as the central processing units in data architecture, supporting computing workloads in data extraction, storage, and transformation of data. Edge devices mainly provide support for data extraction and preprocessing, generally located near the source of the data.

Federated learning is an approach where machine learning (ML) models are trained across a distributed data system. Data transformations and analyses are performed on locally held data across multiple nodes, typically using edge devices or local servers. In this setup, the server that hosts the ML model does not need direct access to the source data. Instead, it aggregates the outputs of the local nodes (the updated model parameters) to train a global model. This method ensures that sensitive data remains local, preserving privacy while still enabling collaborative model training across distributed systems.

Figure 1. Reference architecture for the Dutch health data infrastructure for research and innovation (reproduced from Health-RI [Agreements on the national health data infrastructure for research, policy and innovation - health-RI nationale gezondheidsdata-infrastructuur - confluence. Health-RI. 2024. URL: https:/​/health-ri.​atlassian.net/​wiki/​spaces/​HNG/​pages/​997466565/​Health+data+infrastructure+for+research+policy+and+innovation [accessed 2024-06-03] 26], with permission from Health-RI). API: application programming interface; DAC: data access controller; EDC: electronic data capture; EMR: electronic medical record.

In the case of PLUGIN, another important consideration for choosing FHIR over OMOP is, that from a data architecture perspective, the mechanism of FHIR Profiles can be tied to the principle of late binding commonly applied in data lake or warehouse architectures (Figure 2): allow ingest of widely different sources, and gradually add more constraints and validations as you move closer to a specific use case. If ML is the primary objective for secondary use, one wants to be able to cast a wider net of relevant data, rather than being too restrictive when ingesting the data at the start of the processing pipeline. Late binding in data warehousing is a design philosophy where data transformation and schema enforcement are deferred as late as possible in the data processing pipeline, sometimes even until query time. This approach contrasts with early binding, where data is transformed and structured as it is ingested into the data warehouse. The advantage of this design is that it allows for greater flexibility and allows us to leverage new standards and technologies using the lakehouse architecture and the composable data stack for the implementation of the data stations (Textbox 4). During the initial ingestion of the data, we only require the data to conform to the minimal syntactic standard defined by the base FHIR version (R4 in Figure 2). As the data is processed, more strict checks and constraints are applied, whereby ultimately different profiles can coexist next to one another (the two most inner rectangles in Figure 2), within a larger rectangle with fewer restrictions. Note that if any of the profiles includes a FHIR extension, such as adding a field to include a birth name, the profiles are no longer strictly concentric. Hence extra care needs to be taken when dealing with extensions when applying the principle of late binding.

One of the key challenges in using FHIR in this way pertains to the need for upgrading the whole extract-load-transform pipeline when upgrading to a new primary FHIR version, for example, R6. The potential technical debt of version upgrades in the future is not specific to FHIR, but being a younger standard changes are more frequent compared to OMOP and openEHR. However, we expect that the development time required to upgrade FHIR versions is significantly less than the initial migration to FHIR.

The considerations above also show the conceptual difference between FHIR as a health data exchange standard versus openEHR as a persistent storage of routine health care data and OMOP as a persistent storage of health research data. For health data exchange and FL, the recipient of the data determines, to a large extent, what subset of data in the source needs to be made available, that is, the target data model is known late and this favors late binding. In the case of routine collection of data, the holder of the source data determines what data needs to be stored—and typically everything—which favors early binding.

Figure 2. Principle of late binding with FHIR profiling mechanism, illustrated with FHIR profiles that are currently in use in the Netherlands. Each of the successive profiles are concentric, where the two most inner profiles are illustrated to cover different subsets of the parent profile. In practice, the two inner profiles will have an overlap as they, for example, will both include the Patient resource. FHIR: Fast Health Interoperability Resources.
Textbox 4. Lakehouse architecture and the composable data stack.

Data lakehouses typically have a zonal architecture that follows the extract-load-transform (ETL) pattern where data is ingested from the source systems in bulk (E), delivered to storage with aligned schemas (L), and transformed into a format ready for analysis and reuse (T) [Hai R, Koutras C, Quix C, Jarke M. Data Lakes: a survey of functions and systems. IEEE Trans Knowl Data Eng. 2023;35(12):12571-12590. [CrossRef]42-Harby A, Zulkernine F. Data lakehouse: a survey and experimental study. Inf Syst. 2025;127:102460. [CrossRef]44]. The discerning characteristic of the lakehouse architecture is its foundation on low-cost and directly accessible storage that also provides traditional database management and performance features such as ACID (Atomicity, Consistency, Isolation, and Durability) transactions, data versioning, auditing, indexing, caching, and query optimization [Armbrust M, Ghodsi A, Xin R, Zaharia M. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. 2021. URL: https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf [accessed 2025-03-28] 45].

The composable data stack [Pedreira P, Erling O, Karanasos K, Schneider S, McKinney W, Valluri SR, et al. The composable data management system manifesto. Proc VLDB Endow. 2023;16(10):2679-2685. [CrossRef]46] is a new set of technologies and open standards for the fast processing of data using columnar data formats, including Apache Arrow as the standard columnar in-memory format with remote procedure call–based data movement [Apache Arrow. URL: https://arrow.apache.org/ [accessed 2024-09-20] 47]; Apache Parquet as the standard columnar on-disk format [Apache Parquet. 2024. URL: https://parquet.apache.org/ [accessed 2024-09-20] 48]; and Apache Iceberg as the open table format [Jain P, Kraft P, Power C, Das T, Stoica I, Zaharia M. Analyzing and comparing lakehouse storage systems. 2023. URL: https://www.cidrdb.org/cidr2023/papers/p92-jain.pdf [accessed 2025-03-28] 49,Apache Iceberg. URL: https://iceberg.apache.org/ [accessed 2024-09-20] 50]. This design also enables the use of new embedded, in-memory data processing engines. In turn, this opens up possibilities to bring computing workloads to edge devices, such as running DuckDB in the browser on top of WebAssembly [An in-process SQL OLAP database management system. URL: https://duckdb.org/ [accessed 2024-10-10] 51].

Using these technologies, full separation of storage and compute can be achieved which allows for cost-effective implementation of data stations.


It is a widely held belief that digital technologies have an important role to play in strengthening health systems in LMICs. Yet, also here, the current fragmentation of health data stands in the way of scaling up digital health programs beyond project-centric, vertical solutions into sustainable health information exchanges (HIEs) [Karamagi HC, Muneene D, Droti B, Jepchumba V, Okeibunor JC, Nabyonga J, et al. eHealth or e-Chaos: the use of digital health interventions for health systems strengthening in Sub-Saharan Africa over the last 10 years: A scoping review. J Global Health. 2022;12:04090. [FREE Full text] [CrossRef] [Medline]52]. In the context of global digital health developments, Mehl et al [Mehl GL, Seneviratne MG, Berg ML. A full-STAC remedy for global digital health transformation: open standards, technologies, architectures and content. Oxford Open Digital Health. 2023;1:oqae048. [CrossRef]15] have also called for convergence to open standards, similar to Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1], but additionally stress the need for OSS (as our main argument in this paper), open content (representations of public health, health system or clinical knowledge to guide implementations), and open architectures (reusable enterprise architecture patterns for health systems). As for the open architecture, we see a convergence toward the OpenHIE framework [OpenHIE framework V5.2-En. URL: https://ohie.org/ [accessed 2024-08-27] 53], which has been adopted by many African countries as the architectural blueprint for implementing nationwide HIEs [Mamuye AL, Yilma TM, Abdulwahab A, Broomhead S, Zondo P, Kyeng M, et al. Health information exchange policy and standards for digital health systems in Africa: a systematic review. PLOS Digital Health. 2022;1(10):e0000118. [FREE Full text] [CrossRef] [Medline]54], including Nigeria [Dalhatu I, Aniekwe C, Bashorun A, Abdulkadir A, Dirlikov E, Ohakanu S, et al. From paper files to web-based application for data-driven monitoring of HIV programs: Nigeria's journey to a national data repository for decision-making and patient care. Methods Inf Med. 2023;62(3-04):130-139. [FREE Full text] [CrossRef] [Medline]55], Kenya [Thaiya M, Julia K, Joram M, Benard M, Nambiro D. Adoption of ICT to enhance access to healthcare in Kenya. IOSR-JCE. 2021;23(2):45-50. [FREE Full text]56], and Tanzania [Nsaghurwe A, Dwivedi V, Ndesanjo W, Bamsi H, Busiga M, Nyella E, et al. One country's journey to interoperability: Tanzania's experience developing and implementing a national health information exchange. BMC Med Inform Decis Mak. 2021;21(1):139. [FREE Full text] [CrossRef] [Medline]57]. Figure 3 shows an overview of the OpenHIE architecture.

While the OpenHIE specification is agnostic to which data standards should be used, in practice, the digital health community in LMICs has converged toward FHIR as the primary standard for HIE, in line with the proposal by Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1]. To illustrate this point, consider the OpenHIM Platform architecture (Figure 4), which is currently the largest OSS implementation of the OpenHIE specification. In OpenHIM, clients (point-of-service systems) can initiate various workflows to submit or query patient data. The shared health record (SHR) acts as the core transactional system for the HIE, which in this case is realized with the HAPI FHIR server, being one of the most widely used open-source FHIR server implementations [HAPI FHIR—the open source FHIR API for Java. URL: https://hapifhir.io/ [accessed 2024-09-20] 58].

Figure 3. OpenHIE architecture showing the point of service systems (black), the interoperability Layer (green), and the component layer (blue).
Figure 4. OpenHIM Platform Architecture, illustrating the use of FHIR-based workflows between the components as specified in OpenHIE. API: application programming interface; ASYNC: asynchronous; CR: client registry; FHIR: Fast Health Interoperability Resources; IOL: interoperability layer; MPI: master patient index; SHR: shared health record; SYNC: synchronous.

Looking at the point-of-service systems, we see that as of today, openEHR is rarely used as the standard for routine collection of clinical data in LMICs. The largest OSS electronic health record (EHR) implementations for low-resource settings are based on nonstandardized data models, and it is unlikely this will change any time soon [Syzdykova A, Malta A, Zolfo M, Diro E, Oliveira JL. Open-source electronic health record systems for low-resource settings: systematic review. JMIR Med Inform. 2017;5(4):e44. [FREE Full text] [CrossRef] [Medline]59]. Instead, we see that FHIR-native software development frameworks such as OpenSRP [Mehl G. Open smart register platform (OpenSRP). OpenSRP. 2020. URL: https://lib.digitalsquare.io/handle/123456789/77592 [accessed 2023-01-21] 60] and the Open Health Stack [Open Health Stack. URL: https://developers.google.com/open-health-stack [accessed 2024-09-20] 61] are being used more and more. In this approach, health professionals use Android apps to register and collect routine health data (Figure 5). As an example, OpenSRP has been deployed in 14 countries targeting various patient populations, among which is the implementation of the WHO antenatal and neonatal care guidelines for midwives in Lombok, Indonesia [Development SI for BUNDA app. SUMMIT Institute for Development. 2023. URL: https://www.sid-indonesia.org/post/bunda-app [accessed 2024-01-18] 62,Kurniawan K, FitriaSyah I, Jayakusuma A, Armis RA, Lubis Y, Haryono MA, et al. Midwife service coverage, quality of work, and client health improved after deployment of an openSRP-driven client management application in Indonesia. Atlantis Press; 2019. Presented at: Proceedings of the 5th International Conference on Health Sciences (ICHS 2018); November 04, 2018:155-162; Yogyakarta, Indonesia. [CrossRef]63]. Beda EMR takes a similar approach and provides an FHIR native front-end that can be used in combination with any FHIR server as a backend [Beda EMR. URL: https://beda.software/emr [accessed 2024-12-30] 64]. Such a solution design is particularly useful for midsize and smaller health care facilities, which are often resource-constrained, and lacking the basic IT infrastructure to deploy a full-blown electronic medical record system. Hence, the FHIR-based SHR functions as both, the administrative system-of-record and as the hub for information exchange at the same time.

Figure 5. Overview of OpenSRP2 OSS framework for building clinical administration apps. API: application programming interface; FHIR: Fast Health Interoperability Resources; HIS: health information systems.

Finally, regarding longitudinal data analysis, we also see a convergence toward FHIR as the primary standard in LMICs. As in the case of FL, the choice for FHIR to implement data warehouse and analytic platforms is the preferred method due to the widespread availability of complementary OSS components. FHIR-specific technologies such as Bulk FHIR data access and SQL-on-FHIR mentioned earlier, allow the FHIR ecosystem to be used, complemented, and integrated with generic OSS data warehousing components such as Clickhouse [Clickhouse: fast open-source OLAP DBMS. ClickHouse. URL: https://clickhouse.com [accessed 2024-09-20] 65] and dbt [Dbt. URL: https://www.getdbt.com/index [accessed 2024-09-20] 66]. Recently, more studies have pointed to the potential that FHIR brings when it is used in conjunction with ML and artificial intelligence [Balch JA, Ruppert MM, Loftus TJ, Guan Z, Ren Y, Upchurch GR, et al. Machine learning-enabled clinical information systems using fast healthcare interoperability resources data standards: scoping review. JMIR Med Inform. 2023;11:e48297. [FREE Full text] [CrossRef] [Medline]67]. FHIR-based SHRs can act as systems of records for countries, thereby enabling reuse by health researchers, foundations, etc, to create public value with this data.

All in all, we see that in the context of LMICs, the standardization of the three domains put forward by Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] merge into one. The SHR, as the key component within the OpenHIE specification, serves as the back-end of the system of record and provides a transactional, persistent storage engine for information exchange. Downstream longitudinal data stores continue to use FHIR as the common data model for analytical purposes. One could argue that it is in fact advantageous to converge to just one standard, thereby reducing the complexity and cost of the total system. Such a perspective ties in with the notion of the hourglass model and open architectures: because FHIR is inherently designed to make optimal use of internet standards, such as the JSON file format and REST APIs, it is very modular and developer-friendly. The many components that make up the FHIR allow the standard to be used effectively to implement subsystems, such as a facility registry or a health worker registry. By comparison, OMOP and openEHR are designed with a smaller scope with fewer application areas and are thereby less suitable as a standard to implement the subsystems defined in the OpenHIE specification.


We agree with Tsafnat et al [Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]1] that there is a dire need to converge to open data standards in health care and support their proposal to focus on openEHR, FHIR, and OMOP in health care informatics going forward. However, open standards are a necessary but not sufficient condition for the convergence of health data standardization. The availability of OSS implementations and complementary technologies is important when choosing which open standard to use. We find that the proposed trichotomy is too restrictive and therefore of limited use in guiding design choices to be made in real-world scenarios. Instead, we think that the full-STAC approach described by Mehl et al [Mehl GL, Seneviratne MG, Berg ML. A full-STAC remedy for global digital health transformation: open standards, technologies, architectures and content. Oxford Open Digital Health. 2023;1:oqae048. [CrossRef]15] is more comprehensive. Furthermore, we argue that FHIR has the potential to act as the spanning layer for health data interoperability, thereby enabling much wider standardization and adoption within the health data ecosystem at large. This is illustrated by the two cases considered in this paper, where FHIR is used beyond its original scope as a health data exchange standard.

In the case of FL, FHIR can be used interchangeably with OMOP for longitudinal analysis. In addition, due to its inherently modular design, FHIR can be used in conjunction with the principle of late binding, as opposed to early binding for OMOP and openEHR, which is a relevant design criterion for implementing federated data platforms for secondary use. In the case of LMICs, we see that FHIR is emerging as the standard for all three domains of routine health data collection at the clinical point-of-service, data exchange, and longitudinal analysis. We believe this is driven by the resource-constrained setting in LMICs, the modularity of FHIR, and the lower complexity and shallower learning curve of FHIR compared to openEHR. We expect that FHIR will play a major role in driving health data convergence in LMICs because the availability of OSS implementations and complementary components are important enablers in these resource-constrained environments. We strongly support ongoing developments to increase the availability of OSS implementations as digital public goods [Digital public goods alliance. 2024. URL: https://digitalpublicgoods.net/ [accessed 2024-02-05] 68] and integration projects such as Instant OpenHIE [Instant OpenHIE V2. 2024. URL: https://jembi.gitbook.io/instant-v2/ [accessed 2024-09-20] 69], which will improve health data interoperability in LMICs.

Although openEHR has not been chosen as the standard for the two use cases presented here, we want to stress that it is not our intention to argue for or against any of the three standards a priori nor do we intend to dismiss openEHR outright. Instead, our aim is to illustrate the kind of design choices and trade-offs that need to be made, particularly those related to the availability and complementarity of OSS components. Significant developments and uptake of openEHR as a clinical data repository have been reported, with currently 17 openEHR solution providers that have been implemented in thousands of clinics and research organizations worldwide [Delussu G, Frexia F, Mascia C, Sulis A, Meloni V, Del Rio M, et al. A survey of openEHR clinical data repositories. Int J Med Inform. 2024;191:105591. [CrossRef] [Medline]70]. Additionally, work is underway to integrate openEHR with FHIR for data exchange [FHIR Connect specfication. 2024. URL: https://github.com/better-care/fhir-connect-mapping-spec [accessed 2025-02-04] 71,Welcome to openFHIR’s documentation! openFHIR 0.9.3 documentation. URL: https://open-fhir.com/documentation/index.html [accessed 2025-02-04] 72]. Some experts agree that openEHR is the only specification that provides a comprehensive solution for building a standardized EHR [Pedrera-Jiménez M, García-Barrio N, Frid S, Moner D, Boscá-Tomás D, Lozano-Rubí R, et al. Can openEHR, ISO 13606, and HL7 FHIR work together? An agnostic approach for the selection and application of electronic health record standards to the next-generation health data spaces. J Med Internet Res. 2023;25:e48702. [FREE Full text] [CrossRef] [Medline]73]. Furthermore, openEHR is not only being deployed as a clinical system of record but also as a persistent clinical data repository for implementing national HIEs in European Nordic countries [Pohjonen H. Norway, Sweden, and Finland as forerunners in open ecosystems and openEHR. In: Roadmap to Successful Digital Health Ecosystems. United States. Academic Press; 2022:457-471.74] and Slovenia [Bajrić S. Building a sustainable ecosystem for eHealth in Slovenia: opportunities, challenges, and strategies. Digital Health. 2023;9:20552076231205743. [FREE Full text] [CrossRef] [Medline]75], which is very similar to the solution design of the SHR within the OpenHIE architecture presented here. An ongoing program in the south of the Netherlands has demonstrated a decentralized data sharing ecosystem using separate openEHR data stores, where federated queries are supported by the openEHR Archetype Query Language [Demonstratie proof of concept regionaal data-ecosysteem Zuid-Limburg. YouTube. 2025. URL: https://www.youtube.com/watch?v=jT5UTLRX5VQ [accessed 2025-02-26] 76].

The two cases allow us to reflect and revisit the key arguments of this paper, namely the importance of OSS implementations and the availability of complementary components for the wide-scale adoption of health data standards. There is an important and equally complex interplay between OSS development and standardization, where OSS implementation can occur before, after, or in parallel to standardization efforts [Wright SA, Druta D. Open source and standards: the role of open source in the dialogue between research and standardization. IEEE; 2014. Presented at: IEEE Globecom Workshops (GC Wkshps); December 08-12, 2014:650-655; Austin, TX, USA. [CrossRef]77,Blind K, Böhm M. The relationship between open source software and standard setting. Publications Office of the European Union. Publications Office of the European Union; 2019. URL: https:/​/op.​europa.eu/​en/​publication-detail/​-/​publication/​6521f427-01df-11ea-8c1f-01aa75ed71a1/​language-en [accessed 2025-04-04] 78]. Various studies have provided increasing evidence that OSS is a key success factor in driving software-related standardization [Blind K, Böhm M. The relationship between open source software and standard setting. Publications Office of the European Union. Publications Office of the European Union; 2019. URL: https:/​/op.​europa.eu/​en/​publication-detail/​-/​publication/​6521f427-01df-11ea-8c1f-01aa75ed71a1/​language-en [accessed 2025-04-04] 78], and by extension, we think it is critical when aiming to achieve data standardization. The history of how the Digital Imaging and Communications in Medicine imaging standard came to be is a good example of how OSS development was pivotal in achieving wide-scale adoption of this standard [Erickson BJ, Langer S, Nagy P. The role of open-source software in innovation and standardization in radiology. J Am Coll Radiol. 2005;2(11):927-931. [CrossRef] [Medline]79,Nagy P. Open source in imaging informatics. J Digital Imaging. 2007;20 Suppl 1(Suppl 1):1-10. [FREE Full text] [CrossRef] [Medline]80].

In contrast, the phenomena of forking, fragmentation, and splintering are known to hinder an industry from consolidating toward a set of open standards [Simcoe T, Watson J. Forking, fragmentation, and splintering. Strategy Sci. 2019;4(4):283-297. [CrossRef]81]. Given the specific characteristics of data as an artifact, fragmentation is arguably the most relevant of these phenomena. de Reuver et al [de RM, Ofe H, Agahari W, Abbas A, Zuiderwijk A. The openness of data platforms: a research agenda. ACM; 2022. Presented at: CoNEXT '22: The 18th International Conference on emerging Networking EXperiments and Technologies; December 9, 2022:34-41; Rome Italy. [CrossRef]6] expect fragmentation to persist for some time in the evolution of data platforms and associated ecosystems. The case of the Unix operating system is an interesting example where fragmentation hampered standardization, next to market dynamics and issues related to intellectual property rights [Simcoe T, Watson J. Forking, fragmentation, and splintering. Strategy Sci. 2019;4(4):283-297. [CrossRef]81].

But even when OSS has successfully contributed to “tip” the health care industry to a set of health data standards, issues remain regarding the sustainability of the OSS ecosystem itself. The market dynamics and economics of OSS ecosystems differ considerably from industry to industry: sustainability of OSS in the context of, say, the cultural and scientific heritage sector will be different from the challenges of OSS projects that are used as mission-critical components of open digital infrastructures worldwide. In the case of the latter, underfunding is a critical issue and initiatives such as the German Sovereign Tech Agency have been launched to alleviate this [Sovereign Tech Agency. 2025. URL: https://www.sovereign.tech/ [accessed 2025-02-28] 82]. In the context of open health data standards, we believe that risks related to underfunding are lower and more manageable. Within the digital health community, there is a range of commercial companies supporting the OSS projects and creating sustainable businesses from it using various business models like offering support contracts, split licensing, and complementary closed-source products [Chang V, Mills H, Newhouse S. From open source to long-term sustainability: review of business models and case studies. University of Southampton. 2007. URL: https://eprints.soton.ac.uk/263925/ [accessed 2025-02-08] 83]. Regarding the dynamics of forking and fragmentation mentioned earlier, we feel that code forking on balance has a net positive effect on the long-term sustainability of OSS at the level of the software itself, the community, and the ecosystem [Nyman L, Lindman J. Code forking, governance, and sustainability in open source software. Technology Innovation Management Review. 2013. URL: http://timreview.ca/article/644 [accessed 2025-03-28] 84].

Going forward, we suggest the following directions for future research. Given that health data standardization will continue to require mappings, we propose to explore the use of ML, particularly large-language models, as a means to reduce the development effort required to create transformations between various health data formats. New ML methods can also be developed to assess and improve data quality across the various stages of the data processing pipelines. In terms of data integration, we expect that health data will increasingly be used in conjunction with data from social services and the welfare domain, which requires new techniques to integrate different data domains, for example, using knowledge graphs and ontologies. Last, but certainly not least, future research should not only explore the technical but also the social implications of implementing OSS components for data standardization across the health care system, specifically in settings where governance or ethical considerations of data interoperability have not specifically been addressed at a regulatory level. In line with the embedding of open standards in the open-source ecosystem, we assert that the benefits of health data standardization will only be realized if they are coupled with collaborative, community-driven governance models. It remains essential to ensure that the development, adoption, and evolution of standards remain inclusive, transparent, and responsive to the diverse needs within the health system.

Acknowledgments

The authors thankfully acknowledge Joost Holslag for his feedback and discussion on openEHR. DK received funding from PharmAccess as a contractor to conduct the work on low and middle-income countries reported here.

Authors' Contributions

DK contributed to the concept and design of the manuscript and prepared the first draft. AD, MS, and BJV contributed to the section on federated learning. FH and MB contributed to the section on low and middle-income countries. DK and FH revised the manuscript based on the feedback from the peer reviewers. All authors contributed to the final revision and approved the final manuscript.

Conflicts of Interest

AD is the founder, an employee, and a stockholder of Medical Data Works B.V., which provides commercial services for federated data infrastructures. BJV is the co-founder, a stockholder and director of Expertisecentrum Zorgalgoritmen, an AI software company. MB is a co-founder, a stockholder and CEO of Ona, which provides commercial services for health information exchange infrastructures.

  1. Tsafnat G, Dunscombe R, Gabriel D, Grieve G, Reich C. Converge or collide? Making sense of a plethora of open data standards in health care. J Med Internet Res. 2024;26:e55779. [FREE Full text] [CrossRef] [Medline]
  2. de Reuver M, Sørensen C, Basole RC. The digital platform: a research agenda. J Inf Technol. 2018;33(2):124-135. [CrossRef]
  3. Keller P, Tarkowski A. The paradox of open. Open Future. 2021. URL: https://openfuture.pubpub.org/pub/paradox-of-open/release/1 [accessed 2024-03-25]
  4. Estrin D, Sim I. Health care delivery. Open mHealth architecture: an engine for health care innovation. Science. 2010;330(6005):759-760. [CrossRef] [Medline]
  5. Beck M. On the hourglass model. Commun ACM. 2019;62(7):48-57. [CrossRef]
  6. de RM, Ofe H, Agahari W, Abbas A, Zuiderwijk A. The openness of data platforms: a research agenda. ACM; 2022. Presented at: CoNEXT '22: The 18th International Conference on emerging Networking EXperiments and Technologies; December 9, 2022:34-41; Rome Italy. [CrossRef]
  7. Reynolds CJ, Wyatt JC. Open source, open standards, and health care information systems. J Med Internet Res. 2011;13(1):e24. [FREE Full text] [CrossRef] [Medline]
  8. GSM. Wikipedia. 2024. URL: https://en.wikipedia.org/w/index.php?title=GSM&oldid=1245675274 [accessed 2024-09-20]
  9. The state of FHIR 2024 survey results. HL7. URL: https://www.hl7.org/documentcenter/public/white-papers/2024%20StateofFHIRSurveyResults_final.pdf [accessed 2024-04-04]
  10. Carmo A, Martins H. D2.2—EHRxF in a Nutshell-WP2-ISCTE. 2024. URL: https:/​/ehr-exchange-format.​eu/​wp-content/​uploads/​2024/​10/​D2.​2-v20240704-EHRxF-in-a-Nutshell-WP2-ISCTE.​pdf [accessed 2025-03-28]
  11. FHIR in US healthcare regulations. Firely. 2023. URL: https://simplifier.net/organization/firely/news/153 [accessed 2024-05-30]
  12. National digital health mission. India National Health Authority. 2020. URL: https://www.niti.gov.in/sites/default/files/2023-02/ndhm_strategy_overview.pdf [accessed 2025-03-28]
  13. HCX Protocol V0.9. 2023. URL: http://hcxprotocol.io/ [accessed 2024-09-18]
  14. Tilahun B, Mamuye A, Yilma T, Shehata Y. African Union health information exchange guidelines and standards. 2023. URL: https://africacdc.org/download/african-union-health-information-exchange-guidelines-and-standards/ [accessed 2025-03-28]
  15. Mehl GL, Seneviratne MG, Berg ML. A full-STAC remedy for global digital health transformation: open standards, technologies, architectures and content. Oxford Open Digital Health. 2023;1:oqae048. [CrossRef]
  16. Mandl KD, Gottlieb D, Mandel JC, Ignatov V, Sayeed R, Grieve G, et al. Push button population health: the SMART/HL7 FHIR bulk data access application programming interface. NPJ Digital Med. 2020;3(1):151. [FREE Full text] [CrossRef] [Medline]
  17. Jones J, Gottlieb D, Mandel J, Ignatov V, Ellis A, Kubick W, et al. A landscape survey of planned SMART/HL7 bulk FHIR data access API implementations and tools. J Am Med Inform Assoc. 2021;28(6):1284-1287. [FREE Full text] [CrossRef] [Medline]
  18. SQL on FHIR V2.0.0-Pre. URL: https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/ [accessed 2024-09-20]
  19. FHIR Open Source Implementations. URL: https://confluence.hl7.org/display/FHIR/Open+Source+Implementations [accessed 2024-09-20]
  20. Software tools. OHDSI. URL: https://www.ohdsi.org/software-tools/ [accessed 2024-09-20]
  21. Beale SHT. OpenEHR platform. URL: https://openehr.org/products_tools/platform/ [accessed 2024-09-20]
  22. EHRbase 2.0 Website. 2024. URL: https://www.ehrbase.org/ [accessed 2024-09-20]
  23. Rieke N, Hancox J, Li W, Milletarì F, Roth HR, Albarqouni S, et al. The future of digital health with federated learning. NPJ Digital Med. 2020;3:119. [FREE Full text] [CrossRef] [Medline]
  24. Teo ZL, Jin L, Li S, Miao D, Zhang X, Ng WY, et al. Federated machine learning in healthcare: a systematic review on clinical applications and technical architecture. Cell Rep Med. 2024;5(2):101419. [FREE Full text] [CrossRef] [Medline]
  25. Platform voor uitwisseling en hergebruik van klinische data nederland. PLUGIN. URL: https://plugin.healthcare/ [accessed 2024-03-14]
  26. Agreements on the national health data infrastructure for research, policy and innovation - health-RI nationale gezondheidsdata-infrastructuur - confluence. Health-RI. 2024. URL: https:/​/health-ri.​atlassian.net/​wiki/​spaces/​HNG/​pages/​997466565/​Health+data+infrastructure+for+research+policy+and+innovation [accessed 2024-06-03]
  27. Khalid S, Yang C, Blacketer C, Duarte-Salles T, Fernández-Bertolín S, Kim C, et al. A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data. Comput Methods Programs Biomed. 2021;211:106394. [FREE Full text] [CrossRef] [Medline]
  28. Lee S, Kim C, Chang J, Park RW. FeederNet (Federated E-Health Big Data for Evidence Renovation Network) platform in Korea. OHSDI. URL: https://www.ohdsi.org/2022showcase-33/ [accessed 2024-06-04]
  29. Mateus P, Moonen J, Beran M, Jaarsma E, van der Landen SM, Heuvelink J, et al. Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: a Netherlands consortium of dementia cohorts case study. J Biomed Inform. 2024;155:104661. [FREE Full text] [CrossRef] [Medline]
  30. Kroes JA, Bansal AT, Berret E, Christian N, Kremer A, Alloni A, et al. Blueprint for harmonising unstandardised disease registries to allow federated data analysis: prepare for the future. ERJ Open Res. 2022;8(4):00168-02022. [FREE Full text] [CrossRef] [Medline]
  31. Deltomme C, Denturck K, De JP. Federated Health Innovation Network (FHIN). URL: https:/​/www.​ohdsi-europe.org/​images/​symposium-2024/​Posters/​poster%20OHDSI%20FHIN%20Camille%20Deltomme%20-%20Camille%20Deltomme.​pdf [accessed 2024-09-20]
  32. Moncada-Torres A, Martin F, Sieswerda M, Van Soest J, Geleijnse G. VANTAGE6: an open source priVAcy preserviNg federaTed leArninG infrastructurE for secure insight eXchange. AMIA Annu Symp Proc. 2020;2020:870-877. [FREE Full text] [Medline]
  33. Choudhury A, van SJ, Nayak S, Dekker A. Personal health train on FHIR: a privacy preserving federated approach for analyzing FAIR data in healthcare. Springer; 2020. Presented at: Machine Learning, Image Processing, Network Security and Data Sciences; July 30-31, 2020:85-95; Silchar, India. [CrossRef]
  34. Smits D, Van Beusekom B, Martin F, Veen L, Geleijnse G, Moncada-Torres A. An improved infrastructure for privacy-preserving analysis of patient data. Stud Health Technol Inform. 2022;295:144-147. [CrossRef] [Medline]
  35. Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc. 2022;29(9):1642-1653. [FREE Full text] [CrossRef] [Medline]
  36. Mullie L, Afilalo J, Archambault P, Bouchakri R, Brown K, Buckeridge DL, et al. CODA: an open-source platform for federated analysis and machine learning on distributed healthcare data. J Am Med Inform Assoc. 2024;31(3):651-665. [FREE Full text] [CrossRef] [Medline]
  37. Sinaci AA, Gencturk M, Alvarez-Romero C, Laleci Erturkmen GB, Martinez-Garcia A, Escalona-Cuaresma MJ, et al. Privacy-preserving federated machine learning on FAIR health data: a real-world application. Comput Struct Biotechnol J. 2024;24:136-145. [FREE Full text] [CrossRef] [Medline]
  38. Gruendner J, Schwachhofer T, Sippl P, Wolf N, Erpenbeck M, Gulden C, et al. KETOS: Clinical decision support and machine learning as a service—a training and deployment platform based on docker, OMOP-CDM, and FHIR web services. PLoS One. 2019;14(10):e0223010. [FREE Full text] [CrossRef] [Medline]
  39. Cremonesi F, Planat V, Kalokyri V, Kondylakis H, Sanavia T, Resinas VMM, et al. The need for multimodal health data modeling: a practical approach for a federated-learning healthcare platform. J Biomed Inform. 2023;141:104338. [FREE Full text] [CrossRef] [Medline]
  40. Peng Y, Henke E, Reinecke I, Zoch M, Sedlmayr M, Bathelt F. An ETL-process design for data harmonization to participate in international research with German real-world data based on FHIR and OMOP CDM. Int J Med Inform. 2023;169:104925. [FREE Full text] [CrossRef] [Medline]
  41. OMOPonFHIR. URL: https://omoponfhir.org/ [accessed 2024-09-20]
  42. Hai R, Koutras C, Quix C, Jarke M. Data Lakes: a survey of functions and systems. IEEE Trans Knowl Data Eng. 2023;35(12):12571-12590. [CrossRef]
  43. Harby AA, Zulkernine F. From data warehouse to lakehouse: a comparative review. IEEE; 2022. Presented at: IEEE International Conference on Big Data (Big Data); December 17-20, 2022:389-395; Osaka, Japan. [CrossRef]
  44. Harby A, Zulkernine F. Data lakehouse: a survey and experimental study. Inf Syst. 2025;127:102460. [CrossRef]
  45. Armbrust M, Ghodsi A, Xin R, Zaharia M. Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. 2021. URL: https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf [accessed 2025-03-28]
  46. Pedreira P, Erling O, Karanasos K, Schneider S, McKinney W, Valluri SR, et al. The composable data management system manifesto. Proc VLDB Endow. 2023;16(10):2679-2685. [CrossRef]
  47. Apache Arrow. URL: https://arrow.apache.org/ [accessed 2024-09-20]
  48. Apache Parquet. 2024. URL: https://parquet.apache.org/ [accessed 2024-09-20]
  49. Jain P, Kraft P, Power C, Das T, Stoica I, Zaharia M. Analyzing and comparing lakehouse storage systems. 2023. URL: https://www.cidrdb.org/cidr2023/papers/p92-jain.pdf [accessed 2025-03-28]
  50. Apache Iceberg. URL: https://iceberg.apache.org/ [accessed 2024-09-20]
  51. An in-process SQL OLAP database management system. URL: https://duckdb.org/ [accessed 2024-10-10]
  52. Karamagi HC, Muneene D, Droti B, Jepchumba V, Okeibunor JC, Nabyonga J, et al. eHealth or e-Chaos: the use of digital health interventions for health systems strengthening in Sub-Saharan Africa over the last 10 years: A scoping review. J Global Health. 2022;12:04090. [FREE Full text] [CrossRef] [Medline]
  53. OpenHIE framework V5.2-En. URL: https://ohie.org/ [accessed 2024-08-27]
  54. Mamuye AL, Yilma TM, Abdulwahab A, Broomhead S, Zondo P, Kyeng M, et al. Health information exchange policy and standards for digital health systems in Africa: a systematic review. PLOS Digital Health. 2022;1(10):e0000118. [FREE Full text] [CrossRef] [Medline]
  55. Dalhatu I, Aniekwe C, Bashorun A, Abdulkadir A, Dirlikov E, Ohakanu S, et al. From paper files to web-based application for data-driven monitoring of HIV programs: Nigeria's journey to a national data repository for decision-making and patient care. Methods Inf Med. 2023;62(3-04):130-139. [FREE Full text] [CrossRef] [Medline]
  56. Thaiya M, Julia K, Joram M, Benard M, Nambiro D. Adoption of ICT to enhance access to healthcare in Kenya. IOSR-JCE. 2021;23(2):45-50. [FREE Full text]
  57. Nsaghurwe A, Dwivedi V, Ndesanjo W, Bamsi H, Busiga M, Nyella E, et al. One country's journey to interoperability: Tanzania's experience developing and implementing a national health information exchange. BMC Med Inform Decis Mak. 2021;21(1):139. [FREE Full text] [CrossRef] [Medline]
  58. HAPI FHIR—the open source FHIR API for Java. URL: https://hapifhir.io/ [accessed 2024-09-20]
  59. Syzdykova A, Malta A, Zolfo M, Diro E, Oliveira JL. Open-source electronic health record systems for low-resource settings: systematic review. JMIR Med Inform. 2017;5(4):e44. [FREE Full text] [CrossRef] [Medline]
  60. Mehl G. Open smart register platform (OpenSRP). OpenSRP. 2020. URL: https://lib.digitalsquare.io/handle/123456789/77592 [accessed 2023-01-21]
  61. Open Health Stack. URL: https://developers.google.com/open-health-stack [accessed 2024-09-20]
  62. Development SI for BUNDA app. SUMMIT Institute for Development. 2023. URL: https://www.sid-indonesia.org/post/bunda-app [accessed 2024-01-18]
  63. Kurniawan K, FitriaSyah I, Jayakusuma A, Armis RA, Lubis Y, Haryono MA, et al. Midwife service coverage, quality of work, and client health improved after deployment of an openSRP-driven client management application in Indonesia. Atlantis Press; 2019. Presented at: Proceedings of the 5th International Conference on Health Sciences (ICHS 2018); November 04, 2018:155-162; Yogyakarta, Indonesia. [CrossRef]
  64. Beda EMR. URL: https://beda.software/emr [accessed 2024-12-30]
  65. Clickhouse: fast open-source OLAP DBMS. ClickHouse. URL: https://clickhouse.com [accessed 2024-09-20]
  66. Dbt. URL: https://www.getdbt.com/index [accessed 2024-09-20]
  67. Balch JA, Ruppert MM, Loftus TJ, Guan Z, Ren Y, Upchurch GR, et al. Machine learning-enabled clinical information systems using fast healthcare interoperability resources data standards: scoping review. JMIR Med Inform. 2023;11:e48297. [FREE Full text] [CrossRef] [Medline]
  68. Digital public goods alliance. 2024. URL: https://digitalpublicgoods.net/ [accessed 2024-02-05]
  69. Instant OpenHIE V2. 2024. URL: https://jembi.gitbook.io/instant-v2/ [accessed 2024-09-20]
  70. Delussu G, Frexia F, Mascia C, Sulis A, Meloni V, Del Rio M, et al. A survey of openEHR clinical data repositories. Int J Med Inform. 2024;191:105591. [CrossRef] [Medline]
  71. FHIR Connect specfication. 2024. URL: https://github.com/better-care/fhir-connect-mapping-spec [accessed 2025-02-04]
  72. Welcome to openFHIR’s documentation! openFHIR 0.9.3 documentation. URL: https://open-fhir.com/documentation/index.html [accessed 2025-02-04]
  73. Pedrera-Jiménez M, García-Barrio N, Frid S, Moner D, Boscá-Tomás D, Lozano-Rubí R, et al. Can openEHR, ISO 13606, and HL7 FHIR work together? An agnostic approach for the selection and application of electronic health record standards to the next-generation health data spaces. J Med Internet Res. 2023;25:e48702. [FREE Full text] [CrossRef] [Medline]
  74. Pohjonen H. Norway, Sweden, and Finland as forerunners in open ecosystems and openEHR. In: Roadmap to Successful Digital Health Ecosystems. United States. Academic Press; 2022:457-471.
  75. Bajrić S. Building a sustainable ecosystem for eHealth in Slovenia: opportunities, challenges, and strategies. Digital Health. 2023;9:20552076231205743. [FREE Full text] [CrossRef] [Medline]
  76. Demonstratie proof of concept regionaal data-ecosysteem Zuid-Limburg. YouTube. 2025. URL: https://www.youtube.com/watch?v=jT5UTLRX5VQ [accessed 2025-02-26]
  77. Wright SA, Druta D. Open source and standards: the role of open source in the dialogue between research and standardization. IEEE; 2014. Presented at: IEEE Globecom Workshops (GC Wkshps); December 08-12, 2014:650-655; Austin, TX, USA. [CrossRef]
  78. Blind K, Böhm M. The relationship between open source software and standard setting. Publications Office of the European Union. Publications Office of the European Union; 2019. URL: https:/​/op.​europa.eu/​en/​publication-detail/​-/​publication/​6521f427-01df-11ea-8c1f-01aa75ed71a1/​language-en [accessed 2025-04-04]
  79. Erickson BJ, Langer S, Nagy P. The role of open-source software in innovation and standardization in radiology. J Am Coll Radiol. 2005;2(11):927-931. [CrossRef] [Medline]
  80. Nagy P. Open source in imaging informatics. J Digital Imaging. 2007;20 Suppl 1(Suppl 1):1-10. [FREE Full text] [CrossRef] [Medline]
  81. Simcoe T, Watson J. Forking, fragmentation, and splintering. Strategy Sci. 2019;4(4):283-297. [CrossRef]
  82. Sovereign Tech Agency. 2025. URL: https://www.sovereign.tech/ [accessed 2025-02-28]
  83. Chang V, Mills H, Newhouse S. From open source to long-term sustainability: review of business models and case studies. University of Southampton. 2007. URL: https://eprints.soton.ac.uk/263925/ [accessed 2025-02-08]
  84. Nyman L, Lindman J. Code forking, governance, and sustainability in open source software. Technology Innovation Management Review. 2013. URL: http://timreview.ca/article/644 [accessed 2025-03-28]


EHR: electronic health record
FAIR: Findable, Accessible, Interoperable, and Reusable
FHIR: Fast Health Interoperability Resources
FL: federated learning
HIE: health information exchange
LMIC: low and middle-income country
ML: machine learning
OMOP: Observational Medical Outcomes Partnership
OSS: open-source software
SHR: shared health record


Edited by J Sarvestan; submitted 10.10.24; peer-reviewed by B Kijsanayotin, J Klann, J Rivers, C Gulden, T Karen, A Adewuyi; comments to author 29.11.24; revised version received 05.02.25; accepted 16.03.25; published 15.04.25.

Copyright

©Daniel Kapitan, Femke Heddema, André Dekker, Melle Sieswerda, Bart-Jan Verhoeff, Matt Berg. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 15.04.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.