%0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e20285 %T Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models %A Liu,Dianbo %A Clemente,Leonardo %A Poirier,Canelle %A Ding,Xiyu %A Chinazzi,Matteo %A Davis,Jessica %A Vespignani,Alessandro %A Santillana,Mauricio %+ Computational Health Informatics Program, Boston Children’s Hospital, 300 Longwood Avenue, Landmark 5th Floor East, Boston, MA, 02215, United States, 1 (617) 919 1795, msantill@g.harvard.edu %K COVID-19 %K coronavirus %K digital epidemiology %K modeling %K modeling disease outbreaks %K emerging outbreak %K machine learning %K precision public health %K machine learning in public health %K forecasting %K digital data %K mechanistic model %K hybrid simulation %K hybrid model %K simulation %D 2020 %7 17.8.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. Objective: We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. Methods: Our method uses the following as inputs: (a) official health reports, (b) COVID-19–related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. Results: Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. Conclusions: Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention. %M 32730217 %R 10.2196/20285 %U http://www.jmir.org/2020/8/e20285/ %U https://doi.org/10.2196/20285 %U http://www.ncbi.nlm.nih.gov/pubmed/32730217