This VHA-sponsored cross-sectional study compared the documentation quality of 11 ambient AI scribe tools against 18 human clinicians across 5 standardized primary care scenarios. Using the modified PDQI-9 instrument, 30 blinded raters consistently scored human-generated notes higher than AI-generated ones across all cases and all 10 quality domains, with the largest gaps occurring when audio conditions were challenging (background noise, masked speakers). The biggest AI deficits appeared in thoroughness, organization, and usefulness.
AI Scribes Lag Clinicians on Note Quality
Conexiant
April 17, 2026