%0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 4 %P e13043 %T Health Care and Precision Medicine Research: Analysis of a Scalable Data Science Platform %A McPadden,Jacob %A Durant,Thomas JS %A Bunch,Dustin R %A Coppi,Andreas %A Price,Nathaniel %A Rodgerson,Kris %A Torre Jr,Charles J %A Byron,William %A Hsiao,Allen L %A Krumholz,Harlan M %A Schulz,Wade L %+ Department of Laboratory Medicine, Yale University School of Medicine, 55 Park Street PS345D, New Haven, CT, 06511, United States, 1 (203) 819 8609, wade.schulz@yale.edu %K data science %K monitoring, physiologic %K computational health care %K medical informatics computing %K big data %D 2019 %7 09.04.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Health care data are increasing in volume and complexity. Storing and analyzing these data to implement precision medicine initiatives and data-driven research has exceeded the capabilities of traditional computer systems. Modern big data platforms must be adapted to the specific demands of health care and designed for scalability and growth. Objective: The objectives of our study were to (1) demonstrate the implementation of a data science platform built on open source technology within a large, academic health care system and (2) describe 2 computational health care applications built on such a platform. Methods: We deployed a data science platform based on several open source technologies to support real-time, big data workloads. We developed data-acquisition workflows for Apache Storm and NiFi in Java and Python to capture patient monitoring and laboratory data for downstream analytics. Results: Emerging data management approaches, along with open source technologies such as Hadoop, can be used to create integrated data lakes to store large, real-time datasets. This infrastructure also provides a robust analytics platform where health care and biomedical research data can be analyzed in near real time for precision medicine and computational health care use cases. Conclusions: The implementation and use of integrated data science platforms offer organizations the opportunity to combine traditional datasets, including data from the electronic health record, with emerging big data sources, such as continuous patient monitoring and real-time laboratory results. These platforms can enable cost-effective and scalable analytics for the information that will be key to the delivery of precision medicine initiatives. Organizations that can take advantage of the technical advances found in data science platforms will have the opportunity to provide comprehensive access to health care data for computational health care and precision medicine research. %M 30964441 %R 10.2196/13043 %U https://www.jmir.org/2019/4/e13043/ %U https://doi.org/10.2196/13043 %U http://www.ncbi.nlm.nih.gov/pubmed/30964441