Artificial intelligence scribes are rapidly being adopted across US health care systems to address documentation burden, but their implementation is occurring with limited empirical evaluation and unresolved ethical, clinical, and regulatory concerns, according to an opinion article published in Annals of Internal Medicine.
The authors called for ex ante regulatory approval and post-deployment quality assurance to align tools with their intended goals. Uncorrected inaccuracies in AI-generated notes can compromise safety, therapeutic efficacy, trust, and health equity, they cautioned.
AI scribes use automated speech recognition and large language models to convert clinical conversations into draft medical notes. The tools are promoted as improving workflow efficiency, physician well-being, and patient-centered care, yet adoption largely precedes empirical evidence of benefit. Systematic errors are a central concern. AI scribes are prone to hallucinate content, false inferences, and attribution errors that can persist in the medical record if clinicians do not consistently review and correct generated notes. Such inaccuracies may erode confidence in documentation and, in some cases, contribute to sentinel safety events through misrepresentation of diagnoses, medications, or clinical reasoning.
AI scribes fail to capture nuances of human communication relevant to clinical evaluation and documentation, including paralinguistic and pragmatic elements such as tone, facial expression, gesture, sarcasm, and metaphor. These deficiencies were particularly relevant in pediatrics, psychiatry, and encounters involving patients with nonnormative speech patterns, raising concerns about bias and ableism. The opinion also highlighted the problem of over-capture, in which AI scribes produce excessively detailed notes that obscure salient clinical information. As lead author Ursula M. Francis, JD, PhD, MSc, of the MacLean Center for Clinical Medical Ethics at the University of Chicago, and colleagues noted, "Adverse impacts of documenting sensitive information, like immigration status, often outweigh the benefits—but AI scribes might include such information reflexively."
Privacy and transparency were identified as additional ethical challenges. Cloud-based storage of full encounter recordings and the involvement of third-party contractors in transcription review expand privacy risks beyond traditional documentation practices. Although these arrangements are often disclosed to institutions and clinicians, patients are rarely informed that nonclinical personnel can access their conversations. Consent processes, frequently embedded in boilerplate privacy notices or conveyed in high-acuity settings, are described as inadequate to ensure meaningful understanding and voluntariness.
AI scribes are marketed in the US as administrative tools and are largely exempt from medical device regulation, despite producing medical device–like output. The commentary authors called for standardized performance metrics, independent reader studies, and clearer regulatory frameworks to guide evaluation and oversight of these technologies.
The authors disclosed having no conflicts of interest.
Source: Annals of Internal Medicine