Published on in Vol 25 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/47254, first published .
The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application

The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application

The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application

Original Paper

1University Institute of Clinical Chemistry, University Hospital Bern, Bern, Switzerland

2Graduate School for Health Sciences, University of Bern, Bern, Switzerland

3Biomedical Data Science Center, University Hospital Lausanne, Lausanne, Switzerland

4Laboratory of Biometry, University of Thessaly, Volos, Greece

5Center for Artificial Intelligence in Medicine, University of Bern, Bern, Switzerland

*these authors contributed equally

Corresponding Author:

Harald Witte, PhD

University Institute of Clinical Chemistry

University Hospital Bern

Freiburgstrasse 10

Bern, 3010

Switzerland

Phone: 41 31 632 83 30

Email: harald.witte@extern.insel.ch


Related ArticleThis is a corrected version. See correction statement in: https://www.jmir.org/2023/1/e54809

Background: Reference intervals (RIs) for patient test results are in standard use across many medical disciplines, allowing physicians to identify measurements indicating potentially pathological states with relative ease. The process of inferring cohort-specific RIs is, however, often ignored because of the high costs and cumbersome efforts associated with it. Sophisticated analysis tools are required to automatically infer relevant and locally specific RIs directly from routine laboratory data. These tools would effectively connect clinical laboratory databases to physicians and provide personalized target ranges for the respective cohort population.

Objective: This study aims to describe the BioRef infrastructure, a multicentric governance and IT framework for the estimation and assessment of patient group–specific RIs from routine clinical laboratory data using an innovative decentralized data-sharing approach and a sophisticated, clinically oriented graphical user interface for data analysis.

Methods: A common governance agreement and interoperability standards have been established, allowing the harmonization of multidimensional laboratory measurements from multiple clinical databases into a unified “big data” resource. International coding systems, such as the International Classification of Diseases, Tenth Revision (ICD-10); unique identifiers for medical devices from the Global Unique Device Identification Database; type identifiers from the Global Medical Device Nomenclature; and a universal transfer logic, such as the Resource Description Framework (RDF), are used to align the routine laboratory data of each data provider for use within the BioRef framework. With a decentralized data-sharing approach, the BioRef data can be evaluated by end users from each cohort site following a strict “no copy, no move” principle, that is, only data aggregates for the intercohort analysis of target ranges are exchanged.

Results: The TI4Health distributed and secure analytics system was used to implement the proposed federated and privacy-preserving approach and comply with the limitations applied to sensitive patient data. Under the BioRef interoperability consensus, clinical partners enable the computation of RIs via the TI4Health graphical user interface for query without exposing the underlying raw data. The interface was developed for use by physicians and clinical laboratory specialists and allows intuitive and interactive data stratification by patient factors (age, sex, and personal medical history) as well as laboratory analysis determinants (device, analyzer, and test kit identifier). This consolidated effort enables the creation of extremely detailed and patient group–specific queries, allowing the generation of individualized, covariate-adjusted RIs on the fly.

Conclusions: With the BioRef-TI4Health infrastructure, a framework for clinical physicians and researchers to define precise RIs immediately in a convenient, privacy-preserving, and reproducible manner has been implemented, promoting a vital part of practicing precision medicine while streamlining compliance and avoiding transfers of raw patient data. This new approach can provide a crucial update on RIs and improve patient care for personalized medicine.

J Med Internet Res 2023;25:e47254

doi:10.2196/47254

Keywords



Reference Intervals in Clinical Diagnostics

The use of blood tests is a cornerstone of disease diagnosis and health assessment in clinical medicine. When clinicians try to assess the health status of patients, they heavily rely on laboratory tests and population-based measures such as the reference interval (RI). In their core concept, RIs enclose a fixed range of values from a predefined reference population (eg, 95%), and it has long been established that they are effective in clinical use as long as they are precise and accurate [1-4]. Clinical laboratories must independently establish and periodically verify their RIs in use through admissible guidelines [5]. The widely used guideline EP28-A3c developed by the Clinical and Laboratory Standards Institute (CLSI) and the International Federation of Clinical Chemistry (IFCC) states that RIs should be estimated from cohort-relevant reference populations, where not only patient group–specific covariates such as age, biological sex, ethnicity, and region are considered but also differences in preanalytical factors are accounted for [6]. The process is cumbersome, costly, and often beyond the scope and possibilities of many independently operating laboratories: cohort-specific analyses require stratification by a specific combination of the above-mentioned covariates. Therefore, these analyses are frequently limited to small sample sizes owing to a lack of available data [7,8].

In addition, most studies have been conducted with very lenient inclusion and exclusion criteria owing to a missing overarching definition of “health,” covering both the normative aspects (well-being and functioning) and more descriptive aspects of health evaluation (test result assessment). This hinders the comparability of the generated RIs [9]. A common classification framework to define the health status of the included participants based on predetermined medical conditions is required. In this context, the International Classification of Diseases (ICD) is a commonly used coding system to help represent nuanced diseases to broader morbidities [10]. Inference of RIs with the exclusion or inclusion of specific combinations of diseases (representing the health status of the patient) might enable the personalization of the diagnostic use and provide target ranges that allow the interpretation of the results based on the specific condition of the individual patient [11]. This would essentially allow the creation of RIs as “expectation ranges” for “digital twins,” that is, patients who share similarities with the patient under observation but do not have a specific disease. Particularly for older patients or patients with multiple morbidities, this comparison is seemingly more appropriate, as the concept of a “healthy reference” is inherently unattainable for these populations [12]. In addition, international efforts, such as that of the IFCC’s Task Force on Global Reference Interval Database, aim at generating resources for RIs at a global scale [13].

Harmonized RIs

The aforementioned limitations can be overcome through multicenter collaborative RI studies, where standardized protocols help derive harmonized RIs at a national level by pooling the appropriate number of patients from multiple cohorts [14]. Such standardization requires clear classification systems, for example, for the nomenclature, terminology, units, and formats used, to ensure the reproducibility of all the steps of the complete laboratory testing procedure, possibly for international application [15,16]. This is an ongoing global process, as laboratories in Europe [17-22], Africa [23-25], North America [26], Asia [27-29] and Australia [30] aim at deriving nation-specific RIs through multicenter studies.

The broader introduction of locally inferred RIs from harmonized data sets has not been observed across the board in clinical laboratories [31,32]. Endeavors estimating patient group–specific RIs from electronic health records have shown successful results yet remain sparse [33]. This is mostly due to a lack of sophisticated analysis tools connecting laboratory databases, where multidimensional data are readily available, to physicians in need of clinically relevant RIs. For each standardization effort, clinical physicians or laboratory specialists have to go through significant administrative burden, as they realign the laboratory data for each RI study individually.

The Need for a Streamlined Research Data-Sharing Infrastructure

Switzerland has one of the most restrictive laws surrounding the nature of the collection and sharing of identifying information and personal data, including health data (all referred to as sensitive data). Sensitive data require careful governance, covered by the Swiss Federal Act on Data Protection 1992 (article 3c) [34]. The processing of sensitive data for research is further referenced in the Human Research Act (Federal law 810.30). Unless clinical research data are anonymized, studies require the approval of an ethics committee. Owing to these prerequisites, intercohort data sharing mandates a Data Transfer and Use Agreement between the data provider and the recipient before any sensitive data can be exchanged. Such a practice is common in many other countries as well and causes significant administrative overhead, at times rendering potential stakeholders hesitant to join multiparty research projects. In Switzerland, a national IT environment for sensitive research data, the BioMedIT infrastructure, was established to ensure a backbone for the secure transfer, storage, management, and processing of confidential data [35]. Despite all the progress achieved by this streamlined infrastructure, the hurdle for nationwide data pooling is still relatively high. A recent effort to establish a Swiss multicohort resource in pharmacogenetics has been documented to take up to a year for just setting up the legal and scientific framework [36]. Novel privacy-preserving data exchange and data processing options or platforms could alleviate the regulatory burden imposed on multicohort projects.

The BioRef Vision

The need for an intercohort data-sharing infrastructure that allows a more streamlined process for individual researchers accessing the relevant reference populations and estimating applicable RIs is apparent. The BioRef rationale is to establish an infrastructure that allows the creation of precise RIs from pooled data based on an interoperable semantic framework. Instead of placing the responsibility for data interoperability and aggregation on individual laboratory specialists, establishing an opportunity for clinical laboratories to conveniently and reproducibly check whether their standard RIs apply to their patient populations is important. It should be an essential part of precision medicine practiced today. Ideally, this involves web applications with easily accessible graphical user interfaces (GUIs) that allow the recurrent aggregation of patient data in an accreditation-proof manner and the transfer of the aggregated data from all partners to the interested laboratory specialists (end users). The BioRef initiative relies on a federated and privacy-preserving approach for secure analytics based on multiparty homomorphic encryption [37]. Combining the data of multiple providers broadens a project’s data basis, that is, it results in higher data coverage. Moreover, it increases the chances of gaining insights from rare patient profiles. Data from diverse sources, however, tend to be heterogeneous, which makes it more difficult to leverage and extract interoperable insights. Our federated approach is implemented in the software system TI4Health, the commercial version of its open-source predecessor Medco, a secure system for privacy-preserving federated data exploration and analyses based on advanced privacy-enhancing technologies [38,39]. With this, data remain on the premises and under the full control of the participating institutions. Only the aggregated result of the requested computation is released over the entire distributed virtual database to an authorized user [40]. As RIs are essentially a population aggregate, systems using aggregate data, such as TI4Health, reduce the risk of reidentification of patients owing to the potentially imperfect deidentification of clinical data.


BioRef Governance and Semantic Interoperability

The parties involved in the Swiss BioRef project have formed a multicenter research consortium, the BioRef consortium, which currently consists of 4 major cohort sources in Switzerland: the University Hospital Bern (“Inselspital”), the University Children’s Hospital Zürich (Kinderspital Zürich), Swiss Paraplegic Research, and the University Hospital Lausanne (Centre Hospitalier Universitaire Vaudois [“CHUV”]). The consortium agreement covers multiple aspects of this collaborative effort, including data governance, data delivery, and the required network infrastructure. Participating institutions agreed to contribute their data by making them accessible via a decentralized platform and transferring them to a centralized trusted data host for a validation approach.

The key component for creating a sustainable and an expandable infrastructure is the definition of intercohort concepts regarding semantic interoperability, availability, dimensionality, and quality of the data provided by different cohorts. It is vital that each clinical partner involved is willing to process the data to adhere to harmonized and interoperable standards for data encoding, including Logical Observation Identifiers Names and Codes (LOINC [41]; for analyses); the ICD, Tenth Revision (ICD-10; for diagnoses); and the Anatomical Therapeutic Chemical classification system (for medication). As a preferred semantics and data representation logic of the Swiss Personalized Health Network (SPHN), the Resource Description Framework was chosen, with the underlying BioRef ontology based on the SPHN ontology (release 2021-2) [42].

BioRef Data Recruitment

Data from each contributing cohort consist of quantitative laboratory test results (“measurements”) from 46 frequent laboratory variables uniquely defined by LOINC encoding (Multimedia Appendix 1). Data extraction from the clinical data warehouses and data deidentification (removal of direct identifiers) were exclusively carried out locally by the data scientists of each consortium partner. Data were included only if the patients provided written consent. Routine clinical laboratory data of inpatients were included if at least one LOINC-coded laboratory analysis of interest was performed during the administrative case (admission) and at least one diagnosis (ICD-10-German Modification [GM] coded) was recorded after the administrative case was closed. Notably, inpatients of Swiss hospitals always have at least one ICD-10 diagnosis assigned for billing purposes. To limit the bias caused by repeated measurements, only the first measurement of each LOINC of interest per administrative case was included in each contributing cohort data set. This first value out of a series of values is the least influenced by potential therapeutic measures. Such a practice is in line with previous cohort-specific RI studies [32,43,44]. Each laboratory measurement is currently enriched with patient record information from the clinical data warehouses of the involved hospitals, including age, sex, and the 5 most relevant previously established diagnoses using the ICD-10-GM codes [10]. Age is provided in years with a precision of 3 decimal places for patients aged <18 years and as whole numbers (integer) for patients aged ≥18 years. Attributes for sex are assigned from the set “female, male, other, or unknown,” as predefined in many hospital information systems. The diagnoses used in the BioRef data set represent those recorded at the discharge of the patient. The “relevance” of diagnoses follows the guidelines of the Swiss Federal Office of Public Health, that is, diagnoses represent the so-called «billing diagnoses» used by hospitals for reimbursement from health insurances. In general, the effort required and severity of a diagnosis are considered to correlate. This approach is widely and uniformly used across hospitals. Furthermore, information on the generation of the measurement (analytical factors) is included as linked metadata. This specifies the analyzer and the test kit and reagent used through the unique identifiers for medical devices from the Global Unique Device Identification Database [45] as well as the type identifiers from the Global Medical Device Nomenclature [46]. These additional metadata help overcome the sparsity of information associated with LOINCs with respect to the applied method. Data made available to the project under the consortium agreement span the time frame from June 2014 to February 2023.

Ethical Considerations

This study received an ethics waiver from the cantonal ethics committee of Bern (Business Administration System for Ethics Committees; BASEC-Nr: Req-2020-00630). The platform was initialized using a bulk data load. It is updated on a regular basis, although there is no particular pressure for frequent updates.

BioRef-Federated Analytics Approach

On the basis of common data semantics and under a common contractual architecture, the Swiss BioRef project relies on a decentralized approach for multicohort data pooling to align the BioRef data independently of the available IT resources at each cohort site. Consortium partners compile their data set on their own accord while maintaining full control over the data-sharing process.

The decentralized mechanism underpinning the BioRef infrastructure is based on a privacy-preserving protocol that uses a multiparty homomorphic encryption scheme and obfuscation techniques to allow privacy-preserving federated querying with secure aggregation [37]. It relies on a fully decentralized peer-to-peer infrastructure with no central node, enabling the processing of sensitive data under homomorphic encryption and release of results aggregated across all participating sites [37]. This federated approach follows a strict “no copy, no move” principle, where clinical data do not leave the local site’s database, and only encrypted aggregates are exchanged and further processed between different nodes, always under encryption. This information exchange system requires a minimum of IT components deployed locally. If a data holder is unable to provide the required infrastructure and personnel, node instances can also be installed off-premises within a trusted IT infrastructure.

A proven centralized approach involving a trusted data host system jointly used by the data providers was also implemented as a baseline reference for the verification and validation of the federated approach. This mechanism relies on the existing secure BioMedIT network set up by the Swiss Institute of Bioinformatics: data from BioRef consortium partners are locally collected, encrypted on site with traditional public key cryptography by the data providers, and subsequently securely transferred to a highly restricted project space within the BioMedIT network [47].

Statistical Analysis

Data Preprocessing

The BioRef platform allows the user to interactively design a cohort for querying an underlying “big data” source. To tidy up the input data, a preliminary data cleaning step was introduced to remove measurements from the raw data set that had missing or clearly erroneous entries, including occasional negative values where an analysis does not allow them or ICD-10-GM codes not in use as of May 2022. Furthermore, outlier detection was introduced as the first step of the interactive RI inference algorithms to limit the influence of extreme values (outliers) on the statistical inference. An outlier is informally defined as a data point that significantly deviates from most of the available data [48]. A 3-sigma range (based on the query sample’s mean and SD) was identified to generally detect data points from the harmonized multicohort data set that most likely stemmed from the patient population under consideration. Values outside this 3-sigma range were flagged and removed.

RI Calculation Methods

The gold standard for inferring the RI has long been direct methodology, where test results are sampled from a homogeneous and presumably healthy reference population, and the 2.5th and 97.5th percentiles of the obtained sample are determined [49]. Owing to cohort-specific definitions of health, it is often difficult to harmonize RIs across different patient groups. Indirect methods of RI estimation offer a way to address this limitation [50]. Indirect methods sample and weight test results from a mixed clinical population, including both physiological and pathological test results from routine patient care (general admission to the hospital) [51]. In the context of BioRef, both direct and indirect RI inference methods (with parametric and nonparametric estimations) were adjusted to be fully automated. Following the official recommendation, the standard nonparametric quantile estimation method was implemented [6]. Various factors influence the precision and consistency of the inferred RIs, such as measurement variability; sample size; and, in general, the underlying reference value distribution. For skewed reference distributions exhibiting a single peak, an adaptation of the robust quantile estimator method was implemented [52]. This method contains a parametric Box-Cox transformation step and uses a biweight quantile estimator to calculate the appropriate ranks [53,54]. For analyte distributions that exhibit multiple peaks, an iterative method was proposed to resolve the Gaussian main mode from the distribution mixture [55]. This involves iteratively trimming the overall distribution, assuming a Gaussian distribution in the central region, and subsequently readjusting the SD to account for the trimmed data until convergence. Alternatively, a modified and fully automated Bhattacharya procedure was implemented, where binned data are used to decompose a distribution into Gaussian subcomponents [56]. The developed methods underwent internal testing to ensure their robustness toward outliers and ability to handle varying degrees of skewness. Using bootstrapping techniques, it is possible to estimate the precision of all the implemented methods by generating 90% CIs for the RI boundaries. These CIs simultaneously reflect the precision of the pulled analyte data aggregate and the suitability of the RI methodology in the light of the overall estimation.

Power

The BioRef analytics platform does not estimate new RIs for reference samples of <120 patients, thereby considering the general statistical limitations of RI estimation in accordance with the CLSI guidelines [6]. This means that cohorts of interest with >120 individuals are sufficiently represented. An option for validating the existing RIs with population sizes <120 patients in line with the CLSI validation guidelines is planned for a future release.

Privacy Protection

With the underlying “big data” source, it is necessary to implement mechanisms that ensure end-to-end privacy protection when allowing end users to highly stratify the patient population. The values from a patient query for each cohort are securely aggregated under multiparty homomorphic encryption across all cohorts in a joint frequency table (for histogram building), which can be decrypted only by authorized users. Thus, both patient-level information and local aggregates are protected. Whereas the former never leaves the data holder infrastructure, the latter is always processed under encryption. With the limitation of a minimum of 120 patients and a rounded bin size width, the potential for individual reidentification of patients from the decrypted frequency table can be hindered.

When patient-level data are centralized into the BioRef secure project space on the BioMedIT infrastructure for validation purposes, further deidentification measures are implemented to minimize reidentification risks due to potential data leakages. Particularly, linkages between patients, administrative cases, and measurements had to be removed by the contributing cohorts after local data extraction (“local deidentification”). Measurements in the centralized BioRef data set for validation are, therefore, not linked at any level.


BioRef Architecture and Data Contributions

Currently, the BioRef analytics platform is deployed with harmonized and interoperable data contributions from all BioRef consortium members. The use of the TI4Health architecture allows patient-level data to stay on site at each participating institution regardless of its location, and aggregated frequency tables are computed under multiparty homomorphic encryption, thus ensuring end-to-end data protection (Figure 1). This enables the aggregation of clinical data in a unified manner to create a comprehensive database. User-requested patient queries initiated via the GUI are relayed to the TI4Health instances, which constitute a distributed network for federated confidential computing. Homomorphically encrypted local data aggregates are then exchanged among the network partners to form the global data aggregate. RI computation is carried out by the front end of Swiss BioRef TI4Health, which returns the global aggregate result to the user (Figure 1). Notably, the raw data of the data providers are never shared (the “no copy, no move” principle).

Data from the contributing cohorts consisted of quantitative results from >40 frequently requested key laboratory tests, including analyses from clinical chemistry, hematology, point-of-care testing, and coagulation. These pooled standardized data (approximately 9 million measurements) constituted the multicohort database available on the BioRef platform (Table 1). It currently entails not only data from 2 university hospitals (Inselspital and CHUV) reflecting a broad variety of patients from the general population but also more specific data of patient groups in need of particular care, specifically children (University Children’s Hospital Zürich) and patients with physical disabilities (Swiss Paraplegic Research). Together, this multifaceted, highly standardized data set represents a rich “big data” source ready for further analyses, including end user–driven patient query stratification for the definition of specific RIs.

Figure 1. Illustration of the BioRef federated analytics infrastructure. In the decentralized approach, data is de-identified on site by the individual data providers of the consortium (hospital A, hospital B, ...) and uploaded to the on-premise TI4Health instance. Data are analyzed via the federated confidential computing network without any raw data of the consortium members being revealed.
Table 1. Data contributions of the individual data providers for the BioRef infrastructure as of the time of publication.

InselspitalKiSpiaSwiss Paraplegic ResearchCHUVbTotal
Measurements, n6,793,937454,15535,2711,708,4548,991,817
Unique patients, n205,43717,17988756,809N/Ac
Patient sex, n (%)d

Female100,612 (49)7695 (44.8)278 (31.3)30,739 (54.1)N/A

Male104,825 (51)9484 (55.2)609 (68.7)26,070 (45.9)N/A
Patient age (years), median (IQR)57 (32-73)4.51 (0.76-10.97)58 (42-71)58 (39-73)N/A
Administrative casese363,91228,393887132,344525,531
Unique LOINCf3937332346
Time spangJune 2014 to February 2023April 2014 to May 2022Up to March 2022January 2020 to December 2022N/A

aKiSpi: “Kinderspital Zürich,” University Children’s Hospital Zurich.

bCHUV: “Centre Hospitalier Universitaire Vaudois,” University Hospital Lausanne.

cN/A: not applicable.

dNo nonbinary patients were reported at the time of publication.

eAdmissions.

fLOINC: Logical Observation Identifiers Names and Codes.

gTime span during which the measurements were collected.

BioRef-Federated Analytics Platform

The decentralized privacy-preserving approach was built on the TI4Health operational system (“Swiss BioRef TI4Health”; Figure 2). The extended TI4Health system in the context of the BioRef platform contains (1) the Informatics for Integrating Biology and the Bedside (i2b2) common data model, which is one of the most widely used data models for storing observational longitudinal clinical data and related metadata and is currently implemented in >300 hospitals worldwide and used by most of the Swiss university hospitals, running in a Postgres database [53]; (2) the TI4Health distributed backend; (3) a RESTful application programming interface; and (4) a customized TI4Health web client front end (Figure 2).

On the backend, TI4Health is built via a separate but modular approach, in which the front end query system never directly accesses the unencrypted data stored in the i2b2 data model but communicates only with the backend through the RESTful application programming interface. Once a request is received, the TI4Health backend module forwards it to an i2b2 connector for local data preprocessing and then starts the secure multiparty homomorphic encryption–based distributed aggregation protocol that involves all the other nodes in the network. The encryption protocols used in TI4Health are based on the Lattigo homomorphic encryption library [57]. The data were translated from the Resource Description Framework to the i2b2 format using a data converter module, which was developed during the course of the project [53].

On the front end, the TI4Health web client is the user-facing web application based on Glowing Bear, an open-source web-based GUI for cohort selection and analysis [58]. For BioRef, the Glowing Bear interface was tailored to allow the generation and visualization of precise RIs using an IFCC- and a CLSI-suggested method for nonparametric RI estimation. More specifically, the BioRef GUI allows for interactively setting and executing patient queries based on the covariates and running the statistical inference method on the returned measurements from the client side (Figure 2). It allows setting the patient’s “age” and “sex” as possible stratification variables and including not only diseases or risk factors, such as high blood pressure and diabetes (using the respective ICD-10 code) but also analysis-specific metainformation such as analyzer, test kit, and vendor information.

Figure 2. Graphical user interface of the Swiss BioRef TI4Health webclient. The web application shows the estimates for reference intervals and an accompanying histogram for “chloride in serum or plasma” (LOINC 2075-0) for a female patient cohort aged 55 to 60 years as an exemplary query.

Centralized Validation Platform

A separate central validation platform (Swiss BioRef Central) was set up on the secure BioMedIT infrastructure for method development, benchmarking, and ensuring the correctness of multicohort federated and encrypted analyses (Figure 3). Such a platform enables performance and usability comparisons between decentralized and centralized approaches and the testing of the accuracy of the statistical methods in inferring precise RIs from multicohort resources. The platform offers both direct (IFCC and CLSI approved) and indirect (using newer data mining techniques) methods for the inference of RIs.

This reference platform was built on R Shiny (R Studio, Inc), an operative extension of the R programming language into web application development to allow reactive and interactive data analyses [56]. It runs fully dockerized on a virtual machine with full access to the centralized deidentified data stored in CSV format (Figure 4B). The web traffic of Swiss BioRef Central was implemented behind a reverse proxy layer in the application architecture. This hides server traffic and communication to the front end of the application, which further reduces the risk of exposing sensitive information to the front end.

Figure 3. Screenshot of the Swiss BioRef Central interface. Graphical user interface of the Swiss BioRef Central web application. The web applications show the estimates for reference intervals for “chloride in serum or plasma” (LOINC 2075-0) for a female patient cohort aged 55 to 60 years as an exemplary query.
Figure 4. BioRef platform architecture. Side-by-side comparison of (A) the BioRef decentralized privacy-preserving platform using federate confidential computation and decentralized data linking and (B) the centralized validation platform that enables evaluation from a centralized data pool located within a trusted data host system. Both infrastructures offer their own web applications capable of inferring highly relevant reference intervals from their respective linked data sources.

Targeted RIs for Diagnostic Application

Using the BioRef platform, it is possible to infer RIs for previously underrepresented patient populations in RI studies. For instance, RIs for “HDL cholesterol” (LOINC 14646-4) in male and female clinical patients aged 60 to 65 years were estimated. The resulting RIs (with 90% CIs) and the accompanying histograms were generated on the fly and visible in the web applications (Figure 5).

The estimated RIs for female patients are 0.54 (90% CI 0.51-0.56) to 2.47 (90% CI 2.42-2.51) mmol/L and for male patients are 0.52 (90% CI 0.51-0.53) to 1.92 (90% CI 1.89-1.93) mmol/L, derived from the local population. These results are comparable to those from a published RI study that used similar routine clinical data, the same analytical system (Roche Cobas 8000), and similar laboratory data mining techniques for the estimation of locally specific RIs (female patients: 0.72, 90% CI 0.50-0.80, to 2.02, 90% CI 1.83-2.09 mmol/L; male patients: 0.54, 90% CI 0.49-0.65, to 1.30, 90% CI 1.24-1.63 mmol/L) [59]. Although these RIs do not fully overlap, they are locally significant and stratified by age, in contrast to other published RIs. It is established that high-density lipoprotein decreases with age and addressing this often missing age stratification is crucial [60,61]. This example highlights the need for adapted target ranges that take into account the specific condition of the patient based on their risk and value distribution [11].

Figure 5. Personalized ranges for high-density lipoprotein cholesterol (in mmol/L). Estimated reference intervals for “cholesterol in HDL [moles/volume] in serum or plasma” (LOINC 14646-4) for female patients (n=1848, left) and male patients (n=5026, right) aged 60 to 65 years.

User Evaluation

During a follow-up project of Swiss BioRef (“BioRef - TI4Health”), Inselspital; CHUV; and Tune Insight, which is a spin-off of the Swiss Federal Institute of Technology Lausanne, collaborated to deploy and evaluate the TI4Health system. Reviewers from both the clinical side and clinical data science were onboarded for a preliminary evaluation of the deployed platform to assess its variable accessibility, usability, and performance. Users expressed appreciation for the easy and streamlined web application GUI that quickly filtered their population of interest. Maintaining a perfect balance between a streamlined and intuitively usable GUI and a GUI that entails a complex query selection process is a challenging task. Managing this is crucial because the growing and progressively varying user base will make it even more challenging to anticipate future requirements. Query execution time has emerged as a potential issue for federated systems. Notably, data processing under homomorphic encryption does not cause delays, but rather the i2b2 format is the bottleneck in terms of performance. Using other data formats for which TI4Health offers additional connectors will alleviate this problem.


Principal Finding: Federated Analytics Architecture

With the BioRef platform for federated confidential computing, an interoperable and secure framework for processing distributed multidimensional laboratory data from various cohorts forming a “big data” resource of laboratory measurements has been created and deployed for the first time in an operational setting. The use of a federated analytics approach allows the indirect provision of nonanonymized (ie, identifiable) patient data to a multicentric effort, which is, under the current data protection act, an almost impossible administrative task to tackle [34]. As sensitive data themselves are not shared between participating parties, the BioRef approach is compliant with both national and international data provision laws (ie, the European Union’s General Data Protection Regulation [GDPR]) [62]. Notably, the use of a distributed analytics system as deployed can significantly reduce the governance overhead for future multicohort collaborations [36]. It will also facilitate obtaining permission from ethical boards, as identifying information is retained only by the respective hospital.

Harmonizing Data Resources

The differing data management systems and formats at individual clinical data warehouses are a limiting factor for smooth data provision; significant efforts are required to harmonize the data contribution of all data providers and ensure interoperability before the entry of the data into the BioRef infrastructure. For example, the implementation of LOINC on a national level has advanced notably over the last few years but still requires serious effort to provide high-quality metadata and quality control for laboratory analyses [63]. However, these standardization efforts are not only beneficial for the scope of this project but are also essential for the ongoing digital transformation of laboratory medicine, especially in the age of machine learning and artificial intelligence [64]. Clearly coded, high-dimensional laboratory data can essentially contribute to clinical research in the age of personalization [65]. With increasing data sizes made available for clinical research projects, clear ethical guidelines for “big data” research need to be established [66].

Targeted RIs for Precision Medicine

Standard RIs are inferred under the assumption that an appropriate reference population can be defined as representing a “general” health status, either through a priori or a posteriori selection [67]. It is assumed that the only observed variation in the selected reference values stems from biological interindividual variation [68]. The use of newer methodologies allows the indirect estimation of RIs from real-world data that are considered a mixture of “pathological” and “nonpathological” values via various resolution techniques [69]. However, in the clinical context, where a variety of patient factors are considered during the physician’s anamnesis, RIs estimated from generally “non-pathological” reference individuals are seemingly not the most appropriate reference to compare patients’ blood test results with [12]. Especially in older patients, the differentiation between “disease” and the aging process is difficult; a functional decline observed in old age can originate both from a disease or the aging process itself. The differentiation can be made using peptide biomarkers (eg, N-terminal pro-B-type natriuretic peptide [70,71]), hormones (eg, thyroid-stimulating hormone [72,73]), and lipids (high-density lipoprotein cholesterol [60,61]). Age-related health concerns become prominent in aging populations, and appropriate “reference values” should comprise both values reflecting physiological changes and an increasing fraction of values that would generally be considered pathological to reflect the patient population [73]. Rather than trying to create RIs as “normal ranges” for aging populations, these “expectations ranges” help evaluate the specific patient’s test result in the appropriate context of similar patients (“digital twins”). The possibility to include and exclude specific diagnoses allows the adjustment and fine-tuning of these expected ranges to a variety of multimorbid complexes (eg, diabetes, hyperlipidemia, coronary heart disease, or renal impairment). Here, we suggest that being able to map additional patient parameters such as age and sex as well as individual combinations of multiple morbidities on the analysis of locally derived RIs can essentially provide personalized target ranges fit for application in precision medicine. With the interactive GUI of the web client, these targeted RIs can be generated on the fly, which can then be effectively used when paired with established RIs. Although these are not “personalized” RIs per se, that is, referring to a single patient of interest, they provide second-level information regarding the particularities of a patient group of interest [74]. In cases where there are no RIs established locally for a particular age and sex group, these personalized target ranges can serve as a useful substitute.

Strengths and Limitations

Despite the many benefits that a decentralized data-sharing system offers, a stringent quality control step of centralized data alignment is missing. Therefore, local quality control at all participating sites following a standardized protocol, as well as establishing trust among collaborating partners for the continuation of data provision to the system, is a must. The basis for the overall BioRef data set is the local population, and a broad spectrum of diagnoses is covered in the data set. Very specific diagnoses (eg, psychiatric disorders) or complex combinations of diagnoses may still be underrepresented or even missing; however, this may be overcome in the future through the inclusion of specialized hospitals, broadening the data basis. Mutually beneficial collaborations between additional national and international hospitals and data providers are, therefore, encouraged. Although a centralized approach ensures easily verifiable results for testing and validation (each data holder has full access to the underlying data set), a federated approach allows the onboarding of institutions that are not willing to share data in a centralized setting. This allows for insights from more data than each individual data provider holds. The motivation and deployment conditions for federated and centralized approaches are slightly different, and their applicability depends on the context of the project. An in-depth comparison, scalability in a multinational context, and applicability in the clinical context will need to be addressed in a follow-up study, as these are beyond the scope of this pilot project presenting the first federated setup for RI estimation.

Another challenge for any international network is mirroring the ethnic diversity found in patients across countries, which can influence RIs [33]. The data should include information on the ethnic background of a patient, which needs to be gathered by hospitals. However, this information may not be routinely collected. Preanalytics, for example, sample collection or handling, are another factor that may vary between countries and may hamper data interoperability. Providing additional metainformation on the preanalytics akin to the implemented information on analyzer and reagent may be the way forward, for example, using Standard PREanalytical Codes [75].

Comparison With Prior Work

Multicenter studies operating under a common and centralized standardization effort have already aimed at estimating country-specific RIs [31,76], and previous studies have leveraged routine laboratory data to assess population-specific RIs to some extent [32,33]; however, to our knowledge, a federated query system has not been implemented so far. Although the Canadian Laboratory Initiative on Pediatric Reference Intervals and the Pediatric Reference Intervals Initiative in Germany provide RIs for laboratory analytes in pediatrics via interactive web applications, they both rely on the centralization of the data source [31,76]. The clear advantage of a federated approach, such as BioRef-TI4Health, is that hospitals can contribute data to evaluation without actually sharing them. In the era of “big data,” where an increasing amount of health data is available, this is especially useful, as full anonymization of sensitive data (ie, health data) can be difficult to attain [77].

The use of homomorphic encryption in addition to data aggregation adds an additional layer of security: several publications have shown that aggregated data have the potential to reveal information about individuals (eg, membership in a sensitive cohort and undisclosed private or sensitive attributes) through statistical inference even if the data themselves do not directly identify specific persons [78-80]. Users can only decrypt and see the result of the aggregation of each individual site’s response to the query. Unencrypted setups for remote federated analysis [81,82] cannot fulfill these requirements. In addition, the use of homomorphic encryption to protect site-level aggregated data helps comply with the “data minimization” principle (GDPR article 5) by revealing only the information that is needed for the user’s purposes. Moreover, it satisfies the “privacy by design” principle (GDPR article 25) by minimizing the risk that arises from personal data breaches by making personal information unintelligible to anyone not authorized to access it.

Several different approaches for federated analytics have been implemented and applied to medicine, starting from off-the-shelf federated learning to advanced alternatives such as swarm learning [83-89]. However, most of the time, these approaches were limited to project-specific demonstrations and are not yet implemented in clinical operational settings through scalable and sustainable infrastructures. Examples of successful infrastructure implementations are the Accrual to Clinical Trials Network, TriNetx, and Clinerion [90-92]. However, none of them are particularly focused on laboratory medicine, and BioRef-TI4Health stands out by using state-of-the-art, advanced, and privacy-enhancing technologies to protect data and patient privacy. It will be interesting to compare published RIs on a broad scale with our cohort-specific target ranges in a follow-up study.

Conclusions and Outlook

Within the scope of the Swiss BioRef project, a privacy-preserving federated computing network accessible via a web-based GUI has been established. With BioRef, the SPHN’s long-term goal of transforming medicine toward precision and personalization has reached one of its first manifestations [93]. It allows physicians and clinical researchers to map the individual complexity of their patients to a rich multicohort data pool and permits a substantiated statistical analysis to infer precise and highly relevant RIs. The federated nature of the approach together with the implemented cryptographic mechanisms helps release the brakes which legislation and local data-sharing policies may at times represent to research and related ambitious projects. The federated setup will also facilitate a potential extension of the network, potentially on an international level.

Long-term sustainability is a widespread problem in academic projects, as the costs of both infrastructure operation and maintenance must be addressed. Here, the open architecture and simplified onboarding process of the BioRef platform offer a chance to include academic partners, professional clinical medicine societies, and the diagnostics industry. Tune Insight maintains the Swiss BioRef TI4Health codebase, provides support, and performs further customization for the future of BioRef.

Collaboration with a broad spectrum of stakeholders is fundamental to the continuation of the Swiss BioRef project. It is important not only to showcase the relative ease of use of the proposed platform to both health professionals and clinical researchers who could be potential new end users but also to establish trust regarding the novelty of the developed infrastructure of multicohort data sharing. A stakeholder dialogue could inform novel guidelines for specific health conditions that have applications in the clinical context, which could benefit the harmonization of both the estimation and use of RIs across multiple cohorts. Collaboration with the international Task Force on Global Reference Interval Database of the IFCC is currently being promoted to implement an international system for RI estimation [13].

Given the modularity of both the BioRef consortium and the BioRef-TI4Health system architecture (future national and international partners can join with relative ease) as well as the applications (extendable for additional types of statistical analyses or variables), we see a bright future for personalized target ranges in Switzerland and beyond.

Acknowledgments

The Swiss BioRef project was funded by the Swiss Personalized Health Network (2018DEV22), the University Hospital Bern, and Swiss Paraplegic Research. The Swiss BioRef project was led by a computational medicine group in Bern, which developed the necessary IT components in collaboration with the health informatics and data privacy group in Lausanne. The authors would like to thank their collaborators; Simon Le Bail-Collet; their partners from BioMedIT, DCC, SIB, and Unitecta; and, most importantly, all the patients who provided written consent for their data to be used in the study. The authors are indebted to Frédéric Erard, Julia Maurer, and Jana Rochlitz from SIB, DCC, and Insel, respectively, for their substantial support in establishing the BioRef consortium agreement.

The authors would like to extend their gratitude to Jivko Stoyanov and Gabriela Böhl from the SwiSCI Biobank; Martin Hersberger and Beat Bangerter from the Universitäts-Kinderspital Zürich; and Jeremy Koch, Barbara Jesacher, and Christel Quarré from the University Hospital Bern (Inselspital) for scientific input, data extraction, and cleaning and to Wolfgang Segerer, Beat Gurtner, and Sandra Hoffmeister from the SwiSCI Study Center (Swiss Paraplegic Research) and Yves Jaggi from the University Hospital Lausanne (Centre Hospitalier Universitaire Vaudois) for their assistance in data management.

The authors are indebted to Beatrice Willi, Iris Krüsi, and Cornelia Stress and Myriam Legros, Franziska Amiet, Christof Schild, and Monika Reusser for their excellent support with the extraction of analysis metadata for Swiss Paraplegic Research and Inselspital, respectively. The authors also thank the Swiss Spinal Cord Injury Cohort Study [SwiSCI] Steering Committee and its members Xavier Jordan and Fabienne Reynard (Clinique Romande de Réadaptation, Sion), Michael Baumberger and Luca Jelmoni (Swiss Paraplegic Center, Nottwil), Armin Curt and Martin Schubert (Balgrist University Hospital, Zürich), Margret Hund-Georgiadis and NN (REHAB Basel, Basel), Laurent Prince (Swiss Paraplegic Association, Nottwil), Daniel Joggi (representative of persons with Spinal Cord Injury), Mirjana Bosnjakovic (Parahelp, Nottwil), Mirjam Brach (Swiss Paraplegic Research, Nottwil), and Carla Sabariego (SwiSCI Coordination Group at Swiss Paraplegic Research, Nottwil).

The icons used in Figure 1 (BioRef infrastructure) and Figure 4 (BioRef platform architecture) were made by juicy_fish (TI4Health instance), ppangman (lock), and Freepik (all other icons) from Flaticon [94].

Conflicts of Interest

ABL is a member of the Task Force on Global Reference Interval Database (TF-GRID) if the International Federation of Clinical Chemistry.

Multimedia Appendix 1

The full list of the 46 laboratory variables, uniquely defined by Logical Observation Identifiers Names and Codes (LOINC) encoding, for which quantitative laboratory test results (“measurements”) were collected from each contributing data cohort individually.

XLSX File (Microsoft Excel File), 58 KB

  1. Gräsbeck R, Fellman J. Normal values and statistics. Scand J Clin Lab Invest. 1968;21(3):193-195. [CrossRef] [Medline]
  2. Solberg HE. International federation of clinical chemistry. Scientific committee, clinical section. Expert panel on theory of reference values and international committee for standardization in haematology standing committee on reference values. Approved recommendation (1986) on the theory of reference values. part 1. The concept of reference values. Clin Chim Acta. May 29, 1987;165(1):111-118. [CrossRef] [Medline]
  3. Horn PS, Pesce AJ. Reference intervals: an update. Clin Chim Acta. Aug 2003;334(1-2):5-23. [CrossRef] [Medline]
  4. Ceriotti F, Hinzmann R, Panteghini M. Reference intervals: the way forward. Ann Clin Biochem. Jan 2009;46(Pt 1):8-17. [CrossRef] [Medline]
  5. ISO 15189:2012(en) medical laboratories — requirements for quality and competence. International Organization for Standardization. 2012. URL: https://www.iso.org/obp/ui/#iso:std:iso:15189:ed-3:v2:en [accessed 2023-07-13]
  6. Wayne. Defining, establishing, and verifying reference intervals in the clinical laboratory: approved guideline. Clinical Laboratory Standards Institute. 2008. URL: https://clsi.org/standards/products/method-evaluation/documents/ep28/ [accessed 2023-09-15]
  7. Koerbin G, Sikaris KA, Jones GR, Ryan J, Reed M, Tate J, et al. AACB Committee for Common Reference Intervals. Evidence-based approach to harmonised reference intervals. Clin Chim Acta. May 15, 2014;432:99-107. [CrossRef] [Medline]
  8. Płaczkowska S, Terpińska M, Piwowar A. The importance of establishing reference intervals - is it still a current problem for laboratory and doctors? Clin Lab. Aug 01, 2020;66(8) [CrossRef] [Medline]
  9. Kratzsch J, Fiedler GM, Leichtle A, Brügel M, Buchbinder S, Otto L, et al. New reference intervals for thyrotropin and thyroid hormones based on National Academy of Clinical Biochemistry criteria and regular ultrasonography of the thyroid. Clin Chem. Aug 2005;51(8):1480-1486. [CrossRef] [Medline]
  10. Graubner B, Auhuber T. ICD-10-GM 2009, Systematisches Verzeichnis: Internationale statistische Klassifikation der Krankheiten und verwandter Gesundheitsprobleme 2005. In Memory Business Intelligence. 2005. URL: https:/​/biom131.​imbi.uni-freiburg.de/​medinf/​gmds-ag-mdk/​archiv/​2008/​symposium_180908/​3_Graubner_MUSTERDATEI_ICD-10-GM_ 2009_SYS_240+_080804_080901.​Kap.​I.​pdf [accessed 2023-07-13]
  11. Cadamuro J, Hillarp A, Unger A, von Meyer A, Bauçà JM, Plekhanova O, et al. Presentation and formatting of laboratory results: a narrative review on behalf of the European federation of clinical chemistry and laboratory medicine (EFLM) working group "postanalytical phase" (WG-POST). Crit Rev Clin Lab Sci. Aug 2021;58(5):329-353. [CrossRef] [Medline]
  12. Jørgensen LG, Brandslund I, Hyltoft Petersen P. Should we maintain the 95 percent reference intervals in the era of wellness testing? A concept paper. Clin Chem Lab Med. 2004;42(7):747-751. [CrossRef] [Medline]
  13. Task force on global reference interval database (TF-GRID). The International Federation of Clinical Chemistry and Laboratory Medicine. URL: https:/​/www.​ifcc.org/​executive-board-and-council/​eb-task-forces/​task-force-on-global-reference -interval-database-tf-grid/​ [accessed 2023-03-04]
  14. Ozarda Y, Ichihara K, Barth JH, Klee G, Committee on Reference IntervalsDecision Limits (C-RIDL)‚ International Federation for Clinical ChemistryLaboratory Medicine. Protocol and standard operating procedures for common use in a worldwide multicenter study on reference values. Clin Chem Lab Med. May 2013;51(5):1027-1040. [FREE Full text] [CrossRef] [Medline]
  15. Plebani M. Harmonization in laboratory medicine: the complete picture. Clin Chem Lab Med. Apr 2013;51(4):741-751. [FREE Full text] [CrossRef] [Medline]
  16. Plebani M. Harmonization of clinical laboratory information - current and future strategies. EJIFCC. Feb 09, 2016;27(1):15-22. [FREE Full text] [Medline]
  17. Berg J, Lane V. Pathology harmony; a pragmatic and scientific approach to unfounded variation in the clinical laboratory. Ann Clin Biochem. May 2011;48(Pt 3):195-197. [CrossRef] [Medline]
  18. Evgina S, Ichihara K, Ruzhanskaya A, Skibo I, Vybornova N, Vasiliev A, et al. Establishing reference intervals for major biochemical analytes for the Russian population: a research conducted as a part of the IFCC global study on reference values. Clin Biochem. Jul 2020;81:47-58. [FREE Full text] [CrossRef] [Medline]
  19. Ozarda Y, Ichihara K, Jones G, Streichert T, Ahmadian R, IFCC Committee on Reference Intervals and Decision Limits (C-RIDL). Comparison of reference intervals derived by direct and indirect methods based on compatible datasets obtained in Turkey. Clin Chim Acta. Sep 2021;520:186-195. [CrossRef] [Medline]
  20. L van Pelt J, Klatte S, Hwandih T, Barcaru A, Riphagen IJ, Linssen J, et al. Reference intervals for Sysmex XN hematological parameters as assessed in the Dutch Lifelines cohort. Clin Chem Lab Med. May 25, 2022;60(6):907-920. [FREE Full text] [CrossRef] [Medline]
  21. Martinez-Sanchez L, Cobbaert CM, Noordam R, Brouwer N, Blanco-Grau A, Villena-Ortiz Y, et al. Indirect determination of biochemistry reference intervals using outpatient data. PLoS One. May 19, 2022;17(5):e0268522. [FREE Full text] [CrossRef] [Medline]
  22. Islam MN, Griffin TP, Whiriskey R, Hamon S, Cleary B, Blake L, et al. Reference intervals for commonly requested biochemical and haematological parameters in a healthy Irish adult Caucasian population. Ir J Med Sci. Feb 2022;191(1):301-311. [CrossRef] [Medline]
  23. Omuse G, Ichihara K, Maina D, Hoffman M, Kagotho E, Kanyua A, et al. Determination of reference intervals for common chemistry and immunoassay tests for Kenyan adults based on an internationally harmonized protocol and up-to-date statistical methods. PLoS One. Jul 09, 2020;15(7):e0235234. [FREE Full text] [CrossRef] [Medline]
  24. Smit F, Ichihara K, George J, Blanco-Blanco E, Hoffmann M, Erasmus R, et al. Establishment of reference intervals of biochemical analytes for South African adults: a study conducted as part of the IFCC global multicentre study on reference values. J Med Lab Sci Technol S Afr. Jan 2021;3(1):8-23. [FREE Full text] [CrossRef]
  25. Bawua SA, Ichihara K, Keatley R, Arko-Mensah J, Ayeh-Kumi PF, Erasmus R, et al. Derivation of sex and age-specific reference intervals for clinical chemistry analytes in healthy Ghanaian adults. Clin Chem Lab Med. Jul 04, 2022;60(9):1426-1439. [FREE Full text] [CrossRef] [Medline]
  26. Colantonio DA, Kyriakopoulou L, Chan MK, Daly CH, Brinc D, Venner AA, et al. Closing the gaps in pediatric laboratory reference intervals: a CALIPER database of 40 biochemical markers in a healthy and multiethnic population of children. Clin Chem. May 2012;58(5):854-868. [CrossRef] [Medline]
  27. Shah SA, Ichihara K, Dherai AJ, Ashavaid TF. Reference intervals for 33 biochemical analytes in healthy Indian population: C-RIDL IFCC initiative. Clin Chem Lab Med. Nov 27, 2018;56(12):2093-2103. [FREE Full text] [CrossRef] [Medline]
  28. Zeng X, Fang L, Peng Y, Zhang Y, Li X, Wang Z, et al. A multicenter reference interval study of thromboelastography in the Chinese adult population. Thromb Res. Nov 2020;195:180-186. [CrossRef] [Medline]
  29. Baz H, Ichihara K, Selim M, Awad A, Aglan S, Ramadan D, et al. Establishment of reference intervals of clinical chemistry analytes for the adult population in Egypt. PLoS One. Mar 19, 2021;16(3):e0236772. [FREE Full text] [CrossRef] [Medline]
  30. Tate JR, Sikaris KA, Jones GR, Yen T, Koerbin G, Ryan J, et al. Harmonising adult and paediatric reference intervals in australia and new zealand: an evidence-based approach for establishing a first panel of chemistry analytes. Clin Biochem Rev. Nov 2014;35(4):213-235. [FREE Full text] [Medline]
  31. Adeli K, Higgins V, Trajcevski K, White-Al Habeeb N. The Canadian laboratory initiative on pediatric reference intervals: a CALIPER white paper. Crit Rev Clin Lab Sci. Sep 2017;54(6):358-413. [CrossRef] [Medline]
  32. Zierk J, Baum H, Bertram A, Boeker M, Buchwald A, Cario H, et al. High-resolution pediatric reference intervals for 15 biochemical analytes described using fractional polynomials. Clin Chem Lab Med. Jun 25, 2021;59(7):1267-1278. [CrossRef] [Medline]
  33. Rappoport N, Paik H, Oskotsky B, Tor R, Ziv E, Zaitlen N, et al. Comparing ethnicity-specific reference intervals for clinical laboratory tests from EHR data. J Appl Lab Med. Nov 01, 2018;3(3):366-377. [FREE Full text] [CrossRef] [Medline]
  34. Martani A, Egli P, Widmer M, Elger B. Data protection and biomedical research in Switzerland: setting the record straight. Swiss Med Wkly. Aug 24, 2020;150:w20332. [FREE Full text] [CrossRef] [Medline]
  35. Coman Schmid D, Crameri K, Oesterle S, Rinn B, Sengstag T, Stockinger H, et al. BioMedIT network team. SPHN - the BioMedIT network: a secure IT platform for research with sensitive human data. Stud Health Technol Inform. Jun 16, 2020;270:1170-1174. [CrossRef] [Medline]
  36. Franchini F, Kusejko K, Marzolini C, Tellenbach C, Rossi S, Stampf S, et al. Collaborative challenges of multi-cohort projects in pharmacogenetics-why time is essential for meaningful collaborations. JMIR Form Res. Sep 29, 2022;6(9):e36759. [FREE Full text] [CrossRef] [Medline]
  37. Raisaro JL, Troncoso-Pastoriza JR, Misbach M, Sousa JS, Pradervand S, Missiaglia E, et al. MedCo: enabling secure and privacy-preserving exploration of distributed clinical and genomic data. IEEE/ACM Trans Comput Biol Bioinform. Jul 2019;16(4):1328-1341. [CrossRef] [Medline]
  38. Home page. MedCo. URL: https://medco-ch.github.io [accessed 2023-06-29]
  39. Tune insight. GitHub. URL: https://github.com/tuneinsight [accessed 2023-06-29]
  40. Froelicher D, Troncoso-Pastoriza JR, Raisaro JL, Cuendet MA, Sousa JS, Cho H, et al. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nat Commun. Oct 11, 2021;12(1):5910. [FREE Full text] [CrossRef] [Medline]
  41. McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R, et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin Chem. Apr 2003;49(4):624-633. [CrossRef] [Medline]
  42. SPHN semantic framework: SPHN dataset and RDF schema. Data Coordination Center. 2021. URL: https://git.dcc.sib.swiss/sphn-semantic-framework/sphn-ontology [accessed 2022-08-15]
  43. Zierk J, Arzideh F, Kapsner LA, Prokosch HU, Metzler M, Rauh M. Reference interval estimation from mixed distributions using truncation points and the Kolmogorov Smirnov distance (kosmic). Sci Rep. Feb 03, 2020;10(1):1704. [FREE Full text] [CrossRef] [Medline]
  44. Chung JZ. Paediatric reference intervals for ionised calcium - a data mining approach. Clin Chem Lab Med. Jun 25, 2021;59(7):e271-e273. [CrossRef] [Medline]
  45. Global unique device identification database (GUDID). US Food and Drug Administration. 2023. URL: https:/​/www.​fda. gov/​medical-devices/​unique-device-identification-system-udi-system/​global-unique-device-identification-database-gudid [accessed 2023-06-29]
  46. Home page. GMDN Agency. URL: https://www.gmdnagency.org [accessed 2023-06-29]
  47. SETT - secure encryption and transfer tool. BioMedIT Project. URL: https://sett.readthedocs.io/en/stable/ [accessed 2023-07-13]
  48. Pearson RK. Outliers in process modeling and identification. IEEE Trans Control Syst Technol. Jan 2022;10(1):551-563. [FREE Full text] [CrossRef]
  49. Henny J, Vassault A, Boursier G, Vukasovic I, Mesko Brguljan P, Lohmander M, et al. Working Group Accreditation and ISO/CEN standards (WG-A/ISO) of the EFLM. Recommendation for the review of biological reference intervals in medical laboratories. Clin Chem Lab Med. Dec 01, 2016;54(12):1893-1900. [FREE Full text] [CrossRef] [Medline]
  50. Jones GR, Haeckel R, Loh TP, Sikaris K, Streichert T, Katayev A, et al. IFCC Committee on Reference Intervals and Decision Limits. Indirect methods for reference interval determination - review and recommendations. Clin Chem Lab Med. Dec 19, 2018;57(1):20-29. [FREE Full text] [CrossRef] [Medline]
  51. Farrell CL, Nguyen L. Indirect reference intervals: harnessing the power of stored laboratory data. Clin Biochem Rev. May 2019;40(2):99-111. [FREE Full text] [CrossRef] [Medline]
  52. Beasley CMJ, Crowe B, Nilsson M, Wu L, Tabbey R, Hietpas RT, et al. Adaptation of the robust method to large distributions of reference values: program modifications and comparison of alternative computational methods. J Biopharm Stat. 2019;29(3):516-528. [CrossRef] [Medline]
  53. Horn PS. A biweight prediction interval for random samples. J Am Stat Assoc. 1988;83(401):249-256. [FREE Full text] [CrossRef]
  54. Horn PS. Robust quantile estimators for skewed populations. Biometrika. Sep 1990;77(3):631-636. [FREE Full text] [CrossRef]
  55. Ichihara K, Boyd JC, IFCC Committee on Reference Intervals and Decision Limits (C-RIDL). An appraisal of statistical procedures used in derivation of reference intervals. Clin Chem Lab Med. Nov 2010;48(11):1537-1551. [CrossRef] [Medline]
  56. Baadenhuijsen H, Smit JC. Indirect estimation of clinical chemical reference intervals from total hospital patient data: application of a modified Bhattacharya procedure. J Clin Chem Clin Biochem. Dec 1985;23(12):829-839. [CrossRef] [Medline]
  57. lattigo: a library for lattice-based multiparty homomorphic encryption in Go. GitHub. URL: https://github.com/tuneinsight/lattigo [accessed 2023-06-29]
  58. glowing-bear: the modern cohort selection and analysis interface. GitHub. URL: https://github.com/thehyve/glowing-bear [accessed 2022-08-15]
  59. Bakan E, Polat H, Ozarda Y, Ozturk N, Baygutalp NK, Umudum FZ, et al. A reference interval study for common biochemical analytes in Eastern Turkey: a comparison of a reference population with laboratory data mining. Biochem Med (Zagreb). 2016;26(2):210-223. [FREE Full text] [CrossRef] [Medline]
  60. Wilson PW, Anderson KM, Harris T, Kannel WB, Castelli WP. Determinants of change in total cholesterol and HDL-C with age: the Framingham Study. J Gerontol. Nov 1994;49(6):M252-M257. [CrossRef] [Medline]
  61. Ferrara A, Barrett-Connor E, Shan J. Total, LDL, and HDL cholesterol decrease with age in older men and women. The Rancho Bernardo Study 1984-1994. Circulation. Jul 01, 1997;96(1):37-43. [CrossRef] [Medline]
  62. Vlahou A, Hallinan D, Apweiler R, Argiles A, Beige J, Benigni A, et al. Data sharing under the general data protection regulation: time to harmonize law and research ethics? Hypertension. Apr 2021;77(4):1029-1035. [FREE Full text] [CrossRef] [Medline]
  63. Dahlweid F, Kämpf M, Leichtle A. Interoperability of laboratory data in Switzerland ? a spotlight on Bern. J Lab Med. Sep 04, 2018;42(6):251-258. [FREE Full text] [CrossRef]
  64. Blatter TU, Witte H, Nakas CT, Leichtle AB. Big data in laboratory medicine-FAIR quality for AI? Diagnostics (Basel). Aug 09, 2022;12(8):1923. [FREE Full text] [CrossRef] [Medline]
  65. Goetz LH, Schork NJ. Personalized medicine: motivation, challenges, and progress. Fertil Steril. Jun 2018;109(6):952-963. [FREE Full text] [CrossRef] [Medline]
  66. Ferretti A, Ienca M, Velarde MR, Hurst S, Vayena E. The challenges of big data for research ethics committees: a qualitative Swiss study. J Empir Res Hum Res Ethics. Feb 2022;17(1-2):129-143. [FREE Full text] [CrossRef] [Medline]
  67. Ozarda Y. Reference intervals: current status, recent developments and future considerations. Biochem Med (Zagreb). 2016;26(1):5-16. [FREE Full text] [CrossRef] [Medline]
  68. Fraser CG. Biological Variation: From Principles to Practice. Washington, DC. AACC (American Association for Clinical Chemistry) Press; 2001.
  69. Martinez-Sanchez L, Marques-Garcia F, Ozarda Y, Blanco A, Brouwer N, Canalias F, et al. Big data and reference intervals: rationale, current practices, harmonization and standardization prerequisites and future perspectives of indirect determination of reference intervals using routine data. Adv Lab Med. Mar 2021;2(1):9-25. [FREE Full text] [CrossRef] [Medline]
  70. Hill SA, Booth RA, Santaguida PL, Don-Wauchope A, Brown JA, Oremus M, et al. Use of BNP and NT-proBNP for the diagnosis of heart failure in the emergency department: a systematic review of the evidence. Heart Fail Rev. Aug 2014;19(4):421-438. [CrossRef] [Medline]
  71. Mu S, Echouffo-Tcheugui JB, Ndumele CE, Coresh J, Juraschek S, Brady T, et al. NT-proBNP reference intervals in healthy U.S. children, adolescents, and adults. J Appl Lab Med. Jul 05, 2023;8(4):700-712. [CrossRef] [Medline]
  72. Katayev A, Balciza C, Seccombe DW. Establishing reference intervals for clinical laboratory test results: is there a better way? Am J Clin Pathol. Feb 2010;133(2):180-186. [CrossRef] [Medline]
  73. Raverot V, Bonjour M, Abeillon du Payrat J, Perrin P, Roucher-Boulez F, Lasolle H, et al. Age- and Sex-Specific TSH Upper-Limit Reference Intervals in the General French Population: There Is a Need to Adjust Our Actual Practices. J Clin Med. Mar 14, 2020;9(3):792. [FREE Full text] [CrossRef] [Medline]
  74. Coskun A, Sandberg S, Unsal I, Serteser M, Aarsand AK. Personalized reference intervals: from theory to practice. Crit Rev Clin Lab Sci. Nov 2022;59(7):501-516. [CrossRef] [Medline]
  75. Betsou F, Lehmann S, Ashton G, Barnes M, Benson EE, Coppola D, et al. International Society for Biological and Environmental Repositories (ISBER) Working Group on Biospecimen Science. Standard preanalytical coding for biospecimens: defining the sample PREanalytical code. Cancer Epidemiol Biomarkers Prev. Apr 2010;19(4):1004-1011. [CrossRef] [Medline]
  76. Zierk J, Hirschmann J, Toddenroth D, Arzideh F, Haeckel R, Bertram A, et al. Next-generation reference intervals for pediatric hematology. Clin Chem Lab Med. Sep 25, 2019;57(10):1595-1607. [CrossRef] [Medline]
  77. Vokinger KN, Stekhoven DJ, Krauthammer M. Lost in anonymization - a data anonymization reference classification merging legal and technical considerations. J Law Med Ethics. Mar 2020;48(1):228-231. [FREE Full text] [CrossRef] [Medline]
  78. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. Aug 29, 2008;4(8):e1000167. [FREE Full text] [CrossRef] [Medline]
  79. Shringarpure SS, Bustamante CD. Privacy risks from genomic data-sharing beacons. Am J Hum Genet. Nov 05, 2015;97(5):631-646. [FREE Full text] [CrossRef] [Medline]
  80. Raisaro JL, Tramèr F, Ji Z, Bu D, Zhao Y, Carey K, et al. Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks. J Am Med Inform Assoc. Jul 01, 2017;24(4):799-805. [FREE Full text] [CrossRef] [Medline]
  81. Wang Z, Song M, Zhang Z, Song Y, Wang Q, Qi H. Beyond inferring class representatives: user-level privacy leakage from federated learning. arXiv. Preprint posted online December 3, 2018. 2018 [FREE Full text] [CrossRef]
  82. Melis L, Song C, De CE, Shmatikov V. Exploiting unintended feature leakage in collaborative learning. arXiv. Preprint posted online May 10, 2018. 2018 [FREE Full text] [CrossRef]
  83. Rieke N, Hancox J, Li W, Milletarì F, Roth HR, Albarqouni S, et al. The future of digital health with federated learning. NPJ Digit Med. Sep 14, 2020;3:119. [FREE Full text] [CrossRef] [Medline]
  84. Warnat-Herresthal S, Schultze H, Shastry KL, Manamohan S, Mukherjee S, Garg V, COVID-19 Aachen Study (COVAS); Deutsche COVID-19 Omics Initiative (DeCOI); et al. Swarm learning for decentralized and confidential clinical machine learning. Nature. Jun 2021;594(7862):265-270. [FREE Full text] [CrossRef] [Medline]
  85. Dayan I, Roth HR, Zhong A, Harouni A, Gentili A, Abidin AZ, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat Med. Oct 2021;27(10):1735-1743. [FREE Full text] [CrossRef] [Medline]
  86. Dou Q, So TY, Jiang M, Liu Q, Vardhanabhuti V, Kaissis G, et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. NPJ Digit Med. Mar 29, 2021;4(1):60. [FREE Full text] [CrossRef] [Medline]
  87. Adnan M, Kalra S, Cresswell JC, Taylor GW, Tizhoosh HR. Federated learning and differential privacy for medical image analysis. Sci Rep. Feb 04, 2022;12(1):1953. [FREE Full text] [CrossRef] [Medline]
  88. Ogier du Terrail J, Leopold A, Joly C, Béguier C, Andreux M, Maussion C, et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat Med. Jan 2023;29(1):135-146. [CrossRef] [Medline]
  89. Saldanha OL, Quirke P, West NP, James JA, Loughrey MB, Grabsch HI, et al. Swarm learning for decentralized artificial intelligence in cancer histopathology. Nat Med. Jun 2022;28(6):1232-1239. [FREE Full text] [CrossRef] [Medline]
  90. Home page. ENACT Network. URL: https://www.actnetwork.us [accessed 2023-03-13]
  91. Home page. TriNetX. URL: https://trinetx.com/ [accessed 2023-06-29]
  92. Clinerion home. Magnolia International Ltd. URL: https://www.clinerion.com/index.html [accessed 2023-06-29]
  93. Lawrence AK, Selter L, Frey U. SPHN - the Swiss personalized health network initiative. Stud Health Technol Inform. Jun 16, 2020;270:1156-1160. [CrossRef] [Medline]
  94. Home page. Flaticon. URL: https://www.flaticon.com/ [accessed 2023-09-20]


CHUV: Centre Hospitalier Universitaire Vaudois
CLSI: Clinical and Laboratory Standards Institute
GDPR: General Data Protection Regulation
GM: German Modification
GUI: graphical user interface
i2b2: Informatics for Integrating Biology and the Bedside
ICD: International Classification of Diseases
IFCC: International Federation of Clinical Chemistry
LOINC: Logical Observation Identifiers Names and Codes
RI: reference interval
SPHN: Swiss Personalized Health Network


Edited by K Williams; submitted 13.03.23; peer-reviewed by J Cadamuro, S Ashraf, N Rappoport; comments to author 21.05.23; revised version received 13.07.23; accepted 14.07.23; published 18.10.23.

Copyright

©Tobias Ueli Blatter, Harald Witte, Jules Fasquelle-Lopez, Christos Theodoros Nakas, Jean Louis Raisaro, Alexander Benedikt Leichtle. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.10.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.