Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models (CREMLS)

doi:10.2196/52508

Editorial

¹School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada

²Children’s Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada

³JMIR Publications, Inc, Toronto, ON, Canada

⁴Department of Internal Medicine (adjunct), Southern Illinois University School of Medicine, Springfield, IL, United States

⁵Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, United States

⁶School of Health Information Science, University of Victoria, Victoria, BC, Canada

Corresponding Author:

Khaled El Emam, BEng, PhD

School of Epidemiology and Public Health

University of Ottawa

401 Smyth Road

Ottawa, ON, K1H 8L1

Canada

Phone: 1 6137377600

Email: kelemam@ehealthinformation.ca

The number of papers presenting machine learning (ML) models that are being submitted to and published in the Journal of Medical Internet Research and other JMIR Publications journals has steadily increased. Editors and peer reviewers involved in the review process for such manuscripts often go through multiple review cycles to enhance the quality and completeness of reporting. The use of reporting guidelines or checklists can help ensure consistency in the quality of submitted (and published) scientific manuscripts and, for example, avoid instances of missing information. In this Editorial, the editors of JMIR Publications journals discuss the general JMIR Publications policy regarding authors’ application of reporting guidelines and specifically focus on the reporting of ML studies in JMIR Publications journals, using the Consolidated Reporting of Machine Learning Studies (CREMLS) guidelines, with an example of how authors and other journals could use the CREMLS checklist to ensure transparency and rigor in reporting.

J Med Internet Res 2024;26:e52508

doi:10.2196/52508

Keywords

reporting guidelines (6); machine learning (1681); predictive models (33); diagnostic models (1); prognostic models (6); artificial intelligence (1655); editorial policy (1)

The number of papers presenting machine learning (ML) models that are being submitted to and published in the Journal of Medical Internet Research and other JMIR Publications journals has steadily increased over time. The cross-journal JMIR Publications e-collection “Machine Learning” includes nearly 1300 articles as of April 1, 2024 [Machine Learning. JMIR Medical Informatics. URL: https://medinform.jmir.org/themes/500-machine-learning [accessed 2024-04-01] 1], and there are additional sections in other journals, which collate articles related to the field (eg, “Machine Learning from Dermatological Images” [Machine Learning from Digital Images in Dermatology. JMIR Dermatology. URL: https://derma.jmir.org/themes/922-machine-learning-from-digital-images-in-dermatology [accessed 2023-09-22] 2] in JMIR Dermatology). From 2015 to 2022, the number of published articles with “artificial intelligence” (AI) or “machine learning” in the title and abstract in JMIR Publications journals increased from 22 to 298 (13.5-fold growth), and there are already 312 articles in 2023 (14-fold growth). For JMIR Medical Informatics, the number of articles increased from 10 to 160 (16-fold growth) until 2022. This is consistent with the growth in the research and application of medical AI in general where a similar PubMed search (with the keyword “medicine”) revealed a 22-fold growth (from 640 to 14,147 articles) between 2015 and 2022, and there are already 11,272 matching articles in 2023.

Many papers reporting the use of ML models in medicine have used a large clinical data set to make diagnostic or prognostic predictions [Lee S, Kang WS, Kim DW, Seo SH, Kim J, Jeong ST, et al. An artificial intelligence model for predicting trauma mortality among emergency department patients in South Korea: retrospective cohort study. J Med Internet Res. Aug 29, 2023;25:e49283. [FREE Full text] [CrossRef] [Medline]3-Williams DD, Ferro D, Mullaney C, Skrabonja L, Barnes MS, Patton SR, et al. An "All-Data-on-Hand" deep learning model to predict hospitalization for diabetic ketoacidosis in youth with type 1 diabetes: development and validation study. JMIR Diabetes. Jul 18, 2023;8:e47592. [FREE Full text] [CrossRef] [Medline]6]. However, the use of data from electronic health records and other resources is often not without pitfalls as these data are typically collected and optimized for other purposes (eg, medical billing) [Maletzky A, Böck C, Tschoellitsch T, Roland T, Ludwig H, Thumfart S, et al. Lifting hospital electronic health record data treasures: challenges and opportunities. JMIR Med Inform. Oct 21, 2022;10(10):e38557. [FREE Full text] [CrossRef] [Medline]7].

Editors and peer reviewers involved in the review process for such manuscripts often go through multiple review cycles to enhance the quality and completeness of reporting [Emam KE, Klement W, Malin B. Reporting and methodological observations on prognostic and diagnostic machine learning studies. JMIR AI. 2023:e47995. [FREE Full text]8]. The use of reporting guidelines or checklists can help ensure consistency in the quality of submitted (and published) scientific manuscripts and, for instance, avoid instances of missing information. For example, in the experiences of the editors-in-chief of JMIR AI, missing information is especially notable because for manuscripts reporting on ML models, which are submitted to JMIR AI, this can delay the overall review interval by adding more revision cycles.

According to the EQUATOR (Enhancing the Quality and Transparency of Health Research) network, a reporting guideline is “a simple, structured tool for health researchers to use while writing manuscripts. A reporting guideline provides a minimum list of information needed to ensure a manuscript can be, for example: understood by a reader, replicated by a researcher, used by a doctor to make a clinical decision, and included in a systematic review” [What is a reporting guideline? Enhancing the QUAlity and Transparency Of health Research. URL: https://www.equator-network.org/about-us/what-is-a-reporting-guideline/ [accessed 2023-09-22] 9]. These can be presented in the form of a checklist, flow diagram, or structured text.

In this Editorial, we discuss the general JMIR Publications policy regarding authors’ application of reporting guidelines. We then focus specifically on the reporting of ML studies in JMIR Publications journals.

Accumulating evidence suggests that when authors apply reporting guidelines and reporting checklists in health research, they can be beneficial for authors, readers, and the discipline overall by enabling the replication or reproduction of studies. Recent evidence suggests that asking reviewers to use reporting checklists, instead of authors, offers no added benefits regarding reporting quality [Speich B, Mann E, Schönenberger CM, Mellor K, Griessbach AN, Dhiman P, et al. Reminding peer reviewers of reporting guideline items to improve completeness in published articles: primary results of 2 randomized trials. JAMA Netw Open. Jun 01, 2023;6(6):e2317651. [FREE Full text] [CrossRef] [Medline]10]. However, Botos [Botos J. Reported use of reporting guidelines among authors, editorial outcomes, and reviewer ratings related to adherence to guidelines and clarity of presentation. Res Integr Peer Rev. Sep 27, 2018;3(1):7. [FREE Full text] [CrossRef] [Medline]11] reported a positive association between reviewer ratings of adherence to reporting guidelines and favorable editorial decisions, while Stevanovic et al [Stevanovic A, Schmitz S, Rossaint R, Schürholz T, Coburn M. CONSORT item reporting quality in the top ten ranked journals of critical care medicine in 2011: a retrospective analysis. PLoS One. May 28, 2015;10(5):e0128061. [FREE Full text] [CrossRef] [Medline]12] reported a significant positive correlation between adherence to reporting guidelines and citations and between adherence to reporting guidelines and publication in higher-impact-factor journals.

JMIR Publications’ editorial policy recommends that authors adhere to applicable study design and reporting guidelines when preparing manuscripts for submission [What reporting guidelines should I follow for my article? JMIR Publications Knowledge Base and Help Center. URL: https://support.jmir.org/hc/en-us/articles/115001575267-What-reporting-guidelines-should-I-follow-for-my-article [accessed 2024-01-30] 13]. Authors should note that most reporting guidelines are strongly recommended, particularly because they can improve the quality, completeness, and organization of the presented work. At this time, JMIR Publications requires reporting checklists to be completed and supplied as multimedia appendices for randomized controlled trials without [Schulz KF, Altman D, Moher D, CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Trials. Mar 24, 2010;11:32. [FREE Full text] [CrossRef] [Medline]14-Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. Mar 23, 2010;340(mar23 1):c869-c869. [FREE Full text] [CrossRef] [Medline]16] or those with eHealth or mobile health components [Eysenbach G, CONSORT-EHEALTH Group. CONSORT-EHEALTH: improving and standardizing evaluation reports of Web-based and mobile health interventions. J Med Internet Res. Dec 31, 2011;13(4):e126. [FREE Full text] [CrossRef] [Medline]17], systematic and scoping literature reviews across the portfolio, and Implementation Reports in JMIR Medical Informatics [Perrin Franck C, Babington-Ashaye A, Dietrich D, Bediang G, Veltsos P, Gupta PP, et al. iCHECK-DH: Guidelines and Checklist for the Reporting on Digital Health Implementations. J Med Internet Res. May 10, 2023;25:e46694. [FREE Full text] [CrossRef] [Medline]18]. Although some medical journals have mandated the use of certain reporting guidelines and checklists, JMIR Publications recognizes that authors may have concerns about the additional burden that the formalized use of checklists may bring to the submission process. As such, JMIR Publications has chosen to begin recommending the use of ML reporting guidelines and will evaluate their benefits and gather feedback on implementation costs before considering more stringent requirements.

Regarding the reporting of prognostic and diagnostic ML studies, multiple directly relevant checklists have been developed. Klement and El Emam [Klement W, El Emam K. Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation. J Med Internet Res. Aug 31, 2023;25:e48763. [FREE Full text] [CrossRef] [Medline]19] have consolidated these guidelines and checklists into a single set that we refer to as the Consolidated Reporting of Machine Learning Studies (CREMLS) checklist. CREMLS serves as a reporting checklist for journals publishing research describing the development, evaluation, and application of ML models, including all JMIR Publications journals, which have officially adopted these guidelines. CREMLS was developed by identifying existing relevant reporting guidelines and checklists. The initial item list was identified through a structured literature review and expert curation, and then the quality of the methods used for their development was assessed to narrow them down to a high-quality subset. This high-quality item subset was further filtered to reveal those that meet specific inclusion and exclusion criteria. The resultant items were converted to guidelines and a checklist that was reviewed by the editorial board of JMIR AI, followed by a preliminary application to assess articles published in JMIR AI. The final checklist offers present-day best practices for high-quality reporting of studies using ML models.

Examples of the application of the CREMLS checklist are presented in Table 1. In doing so, we identified 7 articles published in JMIR Publications journals, which exemplify each checklist item. Note that not all of the items are relevant to each article, and some articles are particularly good examples of how to operationalize a checklist item.

Table 1. Illustration of how various articles published in JMIR Publications journals implement each of the CREMLS (Consolidated Reporting of Machine Learning Studies) checklist items.

Item number			Item		Example illustrating the item
Study details
	1.1	The medical or clinical task of interest		Examines chronic disease management—a clinical problem with 4 example solutions using ML^a models [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	1.2	The research question		Proposes a framework to transfer old knowledge to a new environment to manage drifts [Zhang X, Xue Y, Su X, Chen S, Liu K, Chen W, et al. A transfer learning approach to correct the temporal performance drift of clinical prediction models: retrospective cohort study. JMIR Med Inform. Nov 09, 2022;10(11):e38053. [FREE Full text] [CrossRef] [Medline]21]
	1.3	Current medical or clinical practice		Provides a review of current practice and issues associated with chronic disease management [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	1.4	The known predictors and confounders of what is being predicted or diagnosed		Describes variables defined as part of a well-established health test available to the public [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	1.5	The overall study design		Presents experimental design with data flow and data partitions used at various steps of the experiment (Figure 1 [Steiger E, Kroll LE. Patient embeddings from diagnosis codes for health care prediction tasks: Pat2Vec machine learning framework. JMIR AI. Apr 21, 2023;2:e40755. [FREE Full text] [CrossRef]22])
	1.6	The medical institutional settings		Describes the institution as an academic (teaching) community hospital where the data were collected [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	1.7	The target patient population		Clear partitioning of target patient populations and the comparator group [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	1.8	The intended use of the ML model		Describes how the prediction model fits in the clinical practice of scheduling operating theater procedures [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	1.9	Existing model performance benchmarks for this task		Reviews existing research and presents achieved performance (eg, AUC^b) [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	1.10	Ethical and other regulatory approvals obtained		Ethics approvals [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
The data
	2.1	Inclusion or exclusion criteria for the patient cohort		Defined in Figure 1 in the paper by Kendale et al [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	2.2	Methods of data collection		Describes sources and methods of data collection, what type of data were used, and potential implied bias in interpretation [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	2.3	Bias introduced due to the method of data collection used		Discusses potential bias in data collection and outcome definition [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	2.4	Data characteristics		Uses descriptive statistics to show data characteristics for different types of data (demographics and clinical measurements) [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	2.5	Methods of data transformation and preprocessing applied		Imputation is discussed [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	2.6	Known quality issues with the data		Missingness and outlier detection were discussed [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	2.7	Sample size calculation		Brief section dedicated to power analysis [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	2.8	Data availability		Explains how to obtain a copy of the data [Kang HYJ, Batbaatar E, Choi D, Choi KS, Ko M, Ryu KS. Synthetic tabular data based on generative adversarial networks in health care: generation and validation using the divide-and-conquer strategy. JMIR Med Inform. Nov 24, 2023;11:e47859. [FREE Full text] [CrossRef] [Medline]24]
Methodology
	3.1	Strategies for handling missing data		Describes how missing values were replaced [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	3.2	Strategies for addressing class imbalance		Describes the approach of using SMOTE^c to adjust class ratios to address imbalance [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	3.3	Strategies for reducing dimensionality of data		Describes the vectorization of a dimension of 100 into a 2D space using an established algorithm [Steiger E, Kroll LE. Patient embeddings from diagnosis codes for health care prediction tasks: Pat2Vec machine learning framework. JMIR AI. Apr 21, 2023;2:e40755. [FREE Full text] [CrossRef]22]
	3.4	Strategies for handling outliers		The authors stated the threshold values used to detect outliers [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	3.5	Strategies for data augmentation		Showed how variable similarity is achieved between synthetic and real data in the context of augmentation [Kang HYJ, Batbaatar E, Choi D, Choi KS, Ko M, Ryu KS. Synthetic tabular data based on generative adversarial networks in health care: generation and validation using the divide-and-conquer strategy. JMIR Med Inform. Nov 24, 2023;11:e47859. [FREE Full text] [CrossRef] [Medline]24]
	3.6	Strategies for model pretraining		Describes and illustrates (Figure 1) how models from other data sets were trained and used in the new model [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	3.7	The rationale for selecting the ML algorithm		Discusses properties of the selected algorithm relevant to the problem at hand as motivation [Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]20]
	3.8	The method of evaluating model performance during training		Presents a separate discussion of evolution in cross-validation settings and external evaluation while also describing hyperparameter tuning [Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]23]
	3.9	The method used for hyperparameter tuning		Comprehensive description of tuning within nested cross-validation (this is a tutorial but illustrates how to describe the process) [Wilimitis D, Walsh CG. Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. JMIR AI. Dec 18, 2023;2:e49023. [FREE Full text] [CrossRef]25]
	3.10	Model’s output adjustments		Describes the final model, how it was calibrated and discusses the impact of embedding on patient data for interpretation [Steiger E, Kroll LE. Patient embeddings from diagnosis codes for health care prediction tasks: Pat2Vec machine learning framework. JMIR AI. Apr 21, 2023;2:e40755. [FREE Full text] [CrossRef]22]
Evaluation
	4.1	Performance metrics used to evaluate the model		Comprehensive and detailed discussion of evaluation and quality metrics [Kang HYJ, Batbaatar E, Choi D, Choi KS, Ko M, Ryu KS. Synthetic tabular data based on generative adversarial networks in health care: generation and validation using the divide-and-conquer strategy. JMIR Med Inform. Nov 24, 2023;11:e47859. [FREE Full text] [CrossRef] [Medline]24]
	4.2	The cost or consequence of errors		Comprehensive error analysis [Wilimitis D, Walsh CG. Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. JMIR AI. Dec 18, 2023;2:e49023. [FREE Full text] [CrossRef]25]
	4.3	The results of internal validation		Detailed validation discussion (internally and externally) [Wilimitis D, Walsh CG. Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. JMIR AI. Dec 18, 2023;2:e49023. [FREE Full text] [CrossRef]25]
	4.4	The final model hyperparameters		Presents details of the final model and the winning parameters [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	4.5	Model evaluation on an external data set		Detailed and comprehensive external validation that is separate from model testing [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	4.6	Characteristics relevant for detecting data shift and drift		Implements performance monitoring, addresses data shifts over time, and illustrates them in detail [Zhang X, Xue Y, Su X, Chen S, Liu K, Chen W, et al. A transfer learning approach to correct the temporal performance drift of clinical prediction models: retrospective cohort study. JMIR Med Inform. Nov 09, 2022;10(11):e38053. [FREE Full text] [CrossRef] [Medline]21]
Explainability and transparency
	5.1	The most important features and how they relate to the outcomes		Presents variable importance (SHAP^d values) in the context of interpretation and compares it to existing literature [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]
	5.2	Plausibility of model outputs		Shows sample output (Figure 4 in the paper by Kendale et al [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5])
	5.3	Interpretation of a model\'s results by an end user		Good discussion about interpretability and use of the final model [Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]5]

^aML: machine learning.

^bAUC: area under the curve.

^cSMOTE: synthetic minority oversampling technique.

^dSHAP: Shapely additive explanations.

We strongly advise authors who seek to submit their manuscripts describing the development, evaluation, and application of ML models to the Journal of Medical Internet Research, JMIR AI, JMIR Medical Informatics, or other JMIR Publications journals to adhere to the CREMLS guidelines and checklist to ensure that they have considered and addressed all relevant details for their work before initiating their submission and review process. More complete and high-quality reporting benefits the authors by accelerating the review cycle and reducing the burden on reviewers. Hence, the need exists for reporting guidelines and checklists for papers describing prognostic and diagnostic ML studies. This is expected to assist, for example, in reducing missing documentation on hyperparameters for an ML model and to clarify how data leakage was avoided. We have observed that peer reviewers have, in practice, been asking authors to improve reporting on the same topics covered in the CREMLS checklist. This is not a surprise given that peer reviewers are experts in the field and would note important information that is missing. Nevertheless, we would encourage reviewers to use the checklist regularly to ensure completeness and consistency.

The CREMLS checklist’s scope is limited to ML models using structured data that are trained and evaluated in silico and in shadow mode. This provides a significant opportunity to expand on the CREMLS to different data modalities and additional phases of model deployment. Should such extended reporting guidelines and checklists be developed, they may be considered for recommendation for submissions to JMIR Publications journals, incorporating lessons learned from the initial checklist for studies reporting the use of ML models.

There is evidence that the completeness of reporting of research studies is beneficial to the authors and the broader scientific community. For prognostic and diagnostic ML studies, many reporting guidelines have been developed, and these have been consolidated into CREMLS, capturing the combined value of the source guidelines and checklists in one place. In this Editorial, we extend journal policy and recommend that authors follow these guidelines when submitting articles to journals in the JMIR Publications portfolio. This will improve the reproducibility of research studies using ML methods, accelerate review cycles, and improve the quality of published papers overall. Given the rapid growth of studies developing, evaluating, and applying ML models, it is important to establish reporting standards early.

Authors' Contributions

KEE and BM conceptualized this study and drafted, reviewed, and edited the manuscript. TIL and GE reviewed and edited the manuscript. WK prepared the literature summary and reviewed the manuscript.

Conflicts of Interest

KEE and BM are co–editors-in-chief of JMIR AI. KEE is the cofounder of Replica Analytics, an Aetion company, and has financial interests in the company. TIL is the scientific editorial director at JMIR Publications, Inc. GE is the executive editor and publisher at JMIR Publications, Inc, receives a salary, and owns equity.

Machine Learning. JMIR Medical Informatics. URL: https://medinform.jmir.org/themes/500-machine-learning [accessed 2024-04-01]
Machine Learning from Digital Images in Dermatology. JMIR Dermatology. URL: https://derma.jmir.org/themes/922-machine-learning-from-digital-images-in-dermatology [accessed 2023-09-22]
Lee S, Kang WS, Kim DW, Seo SH, Kim J, Jeong ST, et al. An artificial intelligence model for predicting trauma mortality among emergency department patients in South Korea: retrospective cohort study. J Med Internet Res. Aug 29, 2023;25:e49283. [FREE Full text] [CrossRef] [Medline]
Deng Y, Ma Y, Fu J, Wang X, Yu C, Lv J, et al. Combinatorial use of machine learning and logistic regression for predicting carotid plaque risk among 5.4 million adults with fatty liver disease receiving health check-ups: population-based cross-sectional study. JMIR Public Health Surveill. Sep 07, 2023;9:e47095. [FREE Full text] [CrossRef] [Medline]
Kendale S, Bishara A, Burns M, Solomon S, Corriere M, Mathis M. Machine learning for the prediction of procedural case durations developed using a large multicenter database: algorithm development and validation study. JMIR AI. Sep 8, 2023;2:e44909. [CrossRef]
Williams DD, Ferro D, Mullaney C, Skrabonja L, Barnes MS, Patton SR, et al. An "All-Data-on-Hand" deep learning model to predict hospitalization for diabetic ketoacidosis in youth with type 1 diabetes: development and validation study. JMIR Diabetes. Jul 18, 2023;8:e47592. [FREE Full text] [CrossRef] [Medline]
Maletzky A, Böck C, Tschoellitsch T, Roland T, Ludwig H, Thumfart S, et al. Lifting hospital electronic health record data treasures: challenges and opportunities. JMIR Med Inform. Oct 21, 2022;10(10):e38557. [FREE Full text] [CrossRef] [Medline]
Emam KE, Klement W, Malin B. Reporting and methodological observations on prognostic and diagnostic machine learning studies. JMIR AI. 2023:e47995. [FREE Full text]
What is a reporting guideline? Enhancing the QUAlity and Transparency Of health Research. URL: https://www.equator-network.org/about-us/what-is-a-reporting-guideline/ [accessed 2023-09-22]
Speich B, Mann E, Schönenberger CM, Mellor K, Griessbach AN, Dhiman P, et al. Reminding peer reviewers of reporting guideline items to improve completeness in published articles: primary results of 2 randomized trials. JAMA Netw Open. Jun 01, 2023;6(6):e2317651. [FREE Full text] [CrossRef] [Medline]
Botos J. Reported use of reporting guidelines among authors, editorial outcomes, and reviewer ratings related to adherence to guidelines and clarity of presentation. Res Integr Peer Rev. Sep 27, 2018;3(1):7. [FREE Full text] [CrossRef] [Medline]
Stevanovic A, Schmitz S, Rossaint R, Schürholz T, Coburn M. CONSORT item reporting quality in the top ten ranked journals of critical care medicine in 2011: a retrospective analysis. PLoS One. May 28, 2015;10(5):e0128061. [FREE Full text] [CrossRef] [Medline]
What reporting guidelines should I follow for my article? JMIR Publications Knowledge Base and Help Center. URL: https://support.jmir.org/hc/en-us/articles/115001575267-What-reporting-guidelines-should-I-follow-for-my-article [accessed 2024-01-30]
Schulz KF, Altman D, Moher D, CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Trials. Mar 24, 2010;11:32. [FREE Full text] [CrossRef] [Medline]
Schulz KF, Altman D, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med. Jun 01, 2010;152(11):726-732. [FREE Full text] [CrossRef] [Medline]
Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, et al. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. Mar 23, 2010;340(mar23 1):c869-c869. [FREE Full text] [CrossRef] [Medline]
Eysenbach G, CONSORT-EHEALTH Group. CONSORT-EHEALTH: improving and standardizing evaluation reports of Web-based and mobile health interventions. J Med Internet Res. Dec 31, 2011;13(4):e126. [FREE Full text] [CrossRef] [Medline]
Perrin Franck C, Babington-Ashaye A, Dietrich D, Bediang G, Veltsos P, Gupta PP, et al. iCHECK-DH: Guidelines and Checklist for the Reporting on Digital Health Implementations. J Med Internet Res. May 10, 2023;25:e46694. [FREE Full text] [CrossRef] [Medline]
Klement W, El Emam K. Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation. J Med Internet Res. Aug 31, 2023;25:e48763. [FREE Full text] [CrossRef] [Medline]
Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic disease prediction using the common data model: development study. JMIR AI. Dec 22, 2022;1(1):e41030. [FREE Full text] [CrossRef]
Zhang X, Xue Y, Su X, Chen S, Liu K, Chen W, et al. A transfer learning approach to correct the temporal performance drift of clinical prediction models: retrospective cohort study. JMIR Med Inform. Nov 09, 2022;10(11):e38053. [FREE Full text] [CrossRef] [Medline]
Steiger E, Kroll LE. Patient embeddings from diagnosis codes for health care prediction tasks: Pat2Vec machine learning framework. JMIR AI. Apr 21, 2023;2:e40755. [FREE Full text] [CrossRef]
Sang S, Sun R, Coquet J, Carmichael H, Seto T, Hernandez-Boussard T. Learning from past respiratory infections to predict COVID-19 outcomes: retrospective study. J Med Internet Res. Feb 22, 2021;23(2):e23026. [FREE Full text] [CrossRef] [Medline]
Kang HYJ, Batbaatar E, Choi D, Choi KS, Ko M, Ryu KS. Synthetic tabular data based on generative adversarial networks in health care: generation and validation using the divide-and-conquer strategy. JMIR Med Inform. Nov 24, 2023;11:e47859. [FREE Full text] [CrossRef] [Medline]
Wilimitis D, Walsh CG. Practical considerations and applied examples of cross-validation for model development and evaluation in health care: tutorial. JMIR AI. Dec 18, 2023;2:e49023. [FREE Full text] [CrossRef]

‎

AI: artificial intelligence

CREMLS: Consolidated Reporting of Machine Learning Studies

EQUATOR: Enhancing the Quality and Transparency of Health Research

ML: machine learning

Edited by T Leung; This is a non–peer-reviewed article. submitted 04.04.24; accepted 04.04.24; published 02.05.24.

©Khaled El Emam, Tiffany I Leung, Bradley Malin, William Klement, Gunther Eysenbach. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 02.05.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models (CREMLS)