- AI notes scored lower across the board. Human-generated notes outperformed AI on all 10 PDQI-9 quality domains, with the most significant gaps in being thorough, organized, and useful — the qualities most critical to clinical decision-making.
- Audio quality matters a lot. AI performance degraded most sharply under challenging acoustic conditions — background noise and masked speakers — suggesting current tools remain sensitive to real-world clinical environments.
- AI scribe output should be treated as a draft. The authors explicitly recommend clinicians view AI-generated notes as starting points requiring review and editing, not finished documentation.
- Vendor-neutral evaluation is essential. Prior literature has been largely single-vendor and efficiency-focused; this study highlights that independent, multi-vendor quality assessments are needed before widespread deployment.
- Efficiency gains may come at a quality cost. While AI scribes demonstrably reduce documentation burden and after-hours work, this study suggests those benefits shouldn't be assumed to come without tradeoffs in note quality and completeness.
AI Scribes Lag Clinicians on Note Quality
Conexiant
April 17, 2026