Chatbots as Accurate as Ophthalmologists in Giving Advice

Richard Mark Kirkner

August 25, 2023

For patients with questions about their eyes, chatbots may be as good as physicians at dispensing advice.

That's the conclusion of a new study that found that a form of the ChatGPT algorithm is about as accurate as humans when responding to patient queries, providing answers that specialists had difficulty differentiating from responses from a panel of their peers.

The cross-sectional study, published August 22 in JAMA Network Open, evaluated chatbot responses to 200 eye care questions from an online advice forum. Eight ophthalmologists reviewed the answers and were able to discern human from bot-generated responses with an accuracy of 61.3%.

Dr Sophia Y. Wang

Sophia Y. Wang, MD, an assistant professor of ophthalmology at Byers Eye Institute of Stanford University, in Stanford, California, and colleagues report that theirs is the first study to compare the quality of ophthalmology advice from a chatbot with ophthalmologist-written content.

The quality of the answers was especially impressive given how long and complicated some of the patent queries were.

"When we input patients' ophthalmology-related medical questions from a medical advice forum into ChatGPT to assess its answers, the quality of those answers was surprisingly good," Wang told Medscape Medical News. "The quality of the answers was especially impressive given how long and complicated some of the patent queries were."

Chatbot on Par With Human Answers

The quality of the chatbot answers was on par with human answers, Wang's group found. The likelihood of answers containing incorrect or inappropriate material was 77.4% for the chatbot and 75.4% for humans. The risk for potential harm from the answers also was similar. Harm was deemed unlikely in 86.5% and 84% of the chatbot and human answers, respectively, according to the researchers. The level of potentially harmful information was 12.6% and 15.1%, while the level of definitely harmful information was 0.9% in both forms of response.

The chatbot was prone to occasional "hallucinations" — fabricated responses — which at times had the potential to cause harm. One example of such behavior: In response to a question about whether cataract surgery could "shrink" the eye, the bot replied that "removal of the cataract can cause a decrease in the size of the eye."

Previous studies of chatbots in ophthalmology, using varying methodologies, have yielded varying results. A 2023 study from Emory University in Atlanta, Georgia, reported that ophthalmologists in training and ChatGPT-4 (the most recent iteration of the platform) listed the appropriate diagnosis among the top three possible choices 95% and 93% of the time, respectively. Researchers from Canada who fed ChatGPT questions from an ophthalmology board certification test prep module reported correct answers 46% of the time.

A cross-sectional study of chatbot responses to patient questions posted to Reddit r/AskDocs reported that evaluators preferred chatbot responses to physician responses in 78.6% of evaluations. In terms of empathy, 4.6% of physician responses were rated empathetic; 45.1% of chatbot responses were.

Dr Riley Lyons

While the Stanford researchers tested ChatGPT-3.5, the high level of accuracy and similarity to human responses are more in line with studies using the newer GPT-4 technology, Riley Lyons, MD, a resident at Emory Eye Center and lead author of the ChatGPT-4 study from that institution, told Medscape.

"The fact that ophthalmologists were only able to distinguish between the chatbot and human responses correctly 61% of the time speaks to the accuracy of AI chatbot responses," Lyons said. "I am surprised the graders did not have more success distinguishing between the human and chatbot responses."

In addition to possible inaccuracies in the responses from the chatbot, he added, "I would imagine human responses could contain typos or grammatical errors that were likely not present in AI chatbot responses."

Some of the chatbot errors the Stanford investigators reported "are just silly misinformation," said Tina Felfeli, MD, an ophthalmologist at the University of Toronto, Canada, who has participated in ChatGPT research.

"If humans have a debate on specific topics, we can imagine the chatbot will also make these errors, and so this is where physicians' oversight of these responses is important," Felfeli said. "ChatGPT is not fully up to date and trained on the latest data available, and so that is where physicians have the upper hand."

Potential Applications, Future Research

For now, Felfeli said, chatbots seem to have the most potential for decreasing physician workloads by streamlining tasks, such as generating radiology reports, composing discharge summaries, and transcribing patient notes.

"This is certainly a very exciting field which is constantly evolving," Wang said. Future research might include finding ways to limit hallucinations, which would make the technology safer for use in medicine. Other areas to study are patient attitudes toward the health advice chatbots generate, as well as ethical issues regarding the use of AI in medicine, she said.

The National Eye Institute and Prevent Blindness provided funding for the study. Wang, Lyons, and Felfeli have disclosed no relevant financial relationships.

JAMA Netw Open. Published online August 22, 2023. Full text

Richard Mark Kirkner is a medical journalist based in the Philadelphia area.

For more news, follow Medscape on Facebook, X, Instagram, and YouTube.

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.

processing....