How It Works
Three AI agents work together behind the scenes during each conversation. When a user describes their symptoms, the first agent identifies the issue and selects the appropriate AMA flowchart, factoring in details like age and sex. The second agent interprets the patient's responses — and crucially, it can do so even when those responses are not clean yes-or-no answers. The third agent translates clinical questions into plain language before presenting them to the user, converting something like "Is the pain severe?" into "How bad is the pain on a scale of 1 to 10?"
The chatbot continues through the flowchart until it can make a recommendation: monitor at home, schedule a regular appointment, or seek emergency care. "Our system uses these flowcharts to ground the conversation with the patient," said study first author Yujia Liu, a PhD student in electrical and computer engineering at UC San Diego.
The transparency is intentional. "Large language models are powerful, but they're a black box," said senior author Edward Wang, a professor at UC San Diego's Jacobs School of Engineering. "We do not know how they generate their responses, and that makes it hard to verify or trust them. But with this system, every recommendation can be traced back to a clinician-validated flowchart."
What the Testing Showed
The team tested the chatbot across more than 30,000 simulated conversations. It selected the correct medical flowchart approximately 84% of the time and followed the prescribed decision-making steps with over 99% accuracy — even when patients described symptoms in varied or informal ways. The system's protocol-anchored design also means healthcare organizations can adapt it to their own clinical logic, replacing the default AMA flowcharts with institution-specific protocols while keeping the same conversational interface.
The accuracy numbers matter because the problem they are solving is real. Self-triage decisions made through general web searches or consumer chatbots carry real risk: people with serious symptoms delay care because something online told them it was probably nothing, or people with minor symptoms arrive in emergency departments that are already stretched. A tool that reliably distinguishes between those situations at scale addresses a genuine gap.
What Comes Next
So far, all testing has been done using simulated conversations rather than real patients. The team's next step is partnering with hospitals to test the chatbot in live clinical settings — the harder validation that will determine whether the system's lab performance holds up under the messiness of actual patient interactions.
Longer term, the researchers plan to develop a mobile app with voice input, multi-language support, and image sharing — features that would extend the tool's reach to older adults, non-English speakers, and users with limited digital literacy. The eventual goal is integration directly into electronic health record systems, where the chatbot could hand off triage data to clinicians before or after a patient visit.
The researchers are careful to frame it as support infrastructure, not a replacement. "It can offload triage tasks from clinicians by providing patients reliable medical guidance at home," Liu said. "Clinicians could also review the conversations and step in when needed." That framing is both accurate and politically necessary; a tool that claims to replace clinical judgment faces a much steeper adoption path than one that extends it.
This analysis is based on reporting from News Medical and the original research published in Nature Health.
Image courtesy of Natanael Melchor and Unsplash.
This article was generated with AI assistance and reviewed for accuracy and quality.