OR WAIT null SECS
Large learning model-based platforms provide largely inaccurate responses to questions concerning vitreoretinal disease and remain inconsistent on repeat queries.
New research suggests current large language model (LLM)-based platforms provide largely inaccurate responses to questions concerning vitreoretinal disease and show inconsistencies in a repeat query.1
The analysis showed 50.0% of answers were materially different despite no functional changes made to the platform between the first and second questions submissions, indicating a lack of consistency between generated information.
“A greater degree of subspecialization in the field of vitreoretinal disease might explain the differences in accuracy,” wrote the investigative team, led by Peter Y. Zhao, MD, New England Eye Center, Tufts Medical Center. “Hallucination generating factually inaccurate response is a known issue with LLM-based platforms but has the potential to cause patient harm in the domain of medical knowledge.”
Patients often require accurate information on ophthalmic conditions to make informed medical decisions, but information on the internet often comes from unregulated or unverified sources, decreasing reliability. Increasing in popularity, artificial intelligence (AI)-based language platforms respond to user inquiries by generating paragraph-length responses.
This cross-sectional analysis evaluated the accuracy and reproducibility of a single chatbot’s responses to commonly asked patient questions about vitreoretinal disease. Investigators collected frequently asked questions from the internet on various vitreoretinal conditions and procedures, including:
All questions were posed to the AI chatbot ChatGPT in January 2023. Responses were evaluated by 2 fellowship-trained vitreoretinal surgeons and graded as accurate if the entirety of the response was considered appropriate. To determine whether answers could change over time, investigators resubmitted questions to the same platform 14 days after the initial inquiry and compared the responses.
Upon analysis, only 8 (15.4%) of the 52 questions submitted initially were graded as completely accurate. After the resubmission of questions, all 52 responses were found to have changed, with 26 responses (50.0%) materially changing.
For 16 of these responses (30.8%), the accuracy materially improved, while for 10 responses (19.2%), the accuracy materially worsened. Investigators noted some responses contained inappropriate or potentially harmful medical advice.
In response to “How do you get rid of epiretinal membrane?”, the chatbot described vitrectomy but also included incorrect options of injection therapy and laser therapy. Then, in response to “What are the treatment options for central serous chorioretinopathy?” the platform included an incorrect statement on corticosteroids being used to reduce inflammation and fluid accumulation in the retina.
In fact, investigators noted corticosteroid therapy can exacerbate central serous chorioretinopathy.
Limitations of the chatbot include its structure designed for research and not for medical use. Investigators noted the evaluated chatbot provided largely accurate responses in the field of preventive cardiovascular disease.2
However, as the chatbot is being continually updated and revised, investigators suggest these findings could be different for vitreoretinal disease in a future investigation.
“Overall, LLB-based platforms could be used by patients to obtain medical advice,” investigators wrote. “Ophthalmologists need to be aware of the limitations and potential for dissemination of misinformation associated with these AI platforms.”