%0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64364 %T Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study %A Berman,Eliza %A Sundberg Malek,Holly %A Bitzer,Michael %A Malek,Nisar %A Eickhoff,Carsten %+ Center for Digital Health, University Hospital Tuebingen, Schaffhausenstrasse 77, Tuebingen, 72072, Germany, 49 70712984350, eliza_berman@alumni.brown.edu %K large language models %K retrieval augmented generation %K LLaMA %K precision oncology %K molecular tumor board %K molecular tumor %K LLMs %K augmented therapy %K MTB %K oncology %K tumor %K clinical trials %K patient care %K treatment %K evidence-based %K accessibility to care %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology. Objective: In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs’ ability to generate evidence-based treatment recommendations using PubMed references. Methods: We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses. Results: A total of 75% of the referenced articles were properly cited from PubMed, while 17% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25% of LLM responses as equal to their recommendations and 37.5% as alternative plausible treatments. Conclusions: This study demonstrates how retrieval augmented generation–enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline. %M 40053768 %R 10.2196/64364 %U https://www.jmir.org/2025/1/e64364 %U https://doi.org/10.2196/64364 %U http://www.ncbi.nlm.nih.gov/pubmed/40053768