Published on in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/52935, first published .
Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study

Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study

Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study

Journals

  1. Luo X, Chen F, Zhu D, Wang L, Wang Z, Liu H, Lyu M, Wang Y, Wang Q, Chen Y. Potential Roles of Large Language Models in the Production of Systematic Reviews and Meta-Analyses. Journal of Medical Internet Research 2024;26:e56780 View
  2. Norberg K, Almoubayyed H, De Ley L, Murphy A, Weldon K, Ritter S. Rewriting Content with GPT-4 to Support Emerging Readers in Adaptive Mathematics Software. International Journal of Artificial Intelligence in Education 2025;35(2):587 View
  3. Uribe S, Maldupa I. Estimating the use of ChatGPT in dental research publications. Journal of Dentistry 2024;149:105275 View
  4. Oermann M. You Cannot Search the Literature Using Artificial Intelligence, and This Is Why. Nursing Education Perspectives 2024;45(6):337 View
  5. Sun S, Huynh K, Cortes G, Hill R, Tran J, Yeh L, Ngo A, Houshyar R, Yaghmai V, Tran M. Testing the Ability and Limitations of ChatGPT to Generate Differential Diagnoses from Transcribed Radiologic Findings. Radiology 2024;313(1) View
  6. Kayabaşı M, Köksaldı S, Durmaz Engin C. Evaluating the reliability of the responses of large language models to keratoconus-related questions. Clinical and Experimental Optometry 2025;108(7):784 View
  7. Chang Y, Yin J, Li J, Liu C, Cao L, Lin S. Applications and Future Prospects of Medical LLMs: A Survey Based on the M-KAT Conceptual Framework. Journal of Medical Systems 2024;48(1) View
  8. Luo Z, Qiao Y, Xu X, Li X, Xiao M, Kang A, Wang D, Pang Y, Xie X, Xie S, Luo D, Ding X, Liu Z, Liu Y, Hu A, Ren Y, Xie J. Cross sectional pilot study on clinical review generation using large language models. npj Digital Medicine 2025;8(1) View
  9. Jongkind R, Elings E, Joukes E, Broens T, Leopold H, Wiesman F, Meinema J. Is your curriculum GenAI-proof? A method for GenAI impact assessment and a case study. MedEdPublish 2025;15:11 View
  10. Oladokun B, Enakrire R, Emmanuel A, Ajani Y, Adetayo A. Hallucitation in Scientific Writing: Exploring Evidence from ChatGPT Versions 3.5 and 4o in Responses to Selected Questions in Librarianship. Journal of Web Librarianship 2025;19(1):62 View
  11. Spinellis D. False authorship: an explorative case study around an AI-generated article published under my name. Research Integrity and Peer Review 2025;10(1) View
  12. See Y, Lim K, Au W, Chia S, Fan X, Li Z. The Use of Large Language Models in Ophthalmology: A Scoping Review on Current Use-Cases and Considerations for Future Works in This Field. Big Data and Cognitive Computing 2025;9(6):151 View
  13. Spitsberg T, Kettler T, McKamie J. Large language model AI-guided creative writing co-creation in secondary schools. Theory Into Practice 2025;64(4):374 View
  14. Taloni A, Sangregorio A, Alessio G, Romeo M, Coco G, Busin L, Sollazzo A, Scorcia V, Giannaccare G. Large language models provide discordant information compared to ophthalmology guidelines. Scientific Reports 2025;15(1) View
  15. Tsai C, Lin Y, Hou J, Tsai S, Yeh P, Kao C. Optimizing patient education for radioactive iodine therapy and the role of ChatGPT incorporating chain-of-thought technique: ChatGPT questionnaire. DIGITAL HEALTH 2025;11 View
  16. Asiri S. Assessing the Reliability of ChatGPT and Gemini in Identifying Relevant Orthodontic Literature. European Journal of General Dentistry 2025 View
  17. Çamlar M, Sevgi U, Erol G, Karakaş F, Doğruel Y, Güngör A. Comparative performance of neurosurgery-specific, peer-reviewed versus general AI chatbots in bilingual board examinations: evaluating accuracy, consistency, and error minimization strategies. Acta Neurochirurgica 2025;167(1) View
  18. Ikhtiar I, Fidiana , Islamiyah W. Development and content validity analysis of artificial intelligence-generated Indonesian language insomnia questionnaire based on the International Classification of Sleep Disorders, Third Edition. Journal of Neurosciences in Rural Practice 2025;0:1 View
  19. Kim E, Kipchumba F, Min S. Geographic Variation in LLM DOI Fabrication: Cross-Country Analysis of Citation Accuracy Across Four Large Language Models. Publications 2025;13(4):49 View
  20. Moulaison‐Sandy H, Thach H. The Wicked Problem of AI: Information Avoidance, Uncomfortable Knowledge, and ChatGPT in Scholarly Communication. Proceedings of the Association for Information Science and Technology 2025;62(1):1030 View
  21. Oermann M, Owens J, Carter-Templeton H, Peterson G, Bailey H. Using Artificial Intelligence for Scholarly Writing. AJN, American Journal of Nursing 2025;125(11):52 View
  22. Bai J, Ji X, Yu J, Wang Y, Guo Y, Xue C, Zhang W, Zhu J. From Patient Concerns to AI Responses: A Delphi-Based Quality Assessment for Axial Spondyloarthritis (Preprint). JMIR AI 2025 View
  23. Linardon J, Jarman H, McClure Z, Anderson C, Liu C, Messer M. Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models: Experimental Study. JMIR Mental Health 2025;12:e80371 View

Books/Policy Documents

  1. Tariciotti L, Zohdy Y, Riva M, Levi R, Pessina F, Pradilla G. Neurosurgery's Frontline Role in Gliomas Treatment. View