Artificial intelligence scribes were rapidly adopted across US health care systems to address documentation burden, but their implementation occurred with limited empirical evaluation and unresolved ethical, clinical, and regulatory concerns, according to an opinion article published in Annals of Internal Medicine.
The authors called for ex ante regulatory approval and postdeployment quality assurance to align these tools with their intended goals. Uncorrected inaccuracies in AI-generated notes can compromise safety, therapeutic efficacy, trust, and health equity, they cautioned.
AI scribes use automated speech recognition and large language models to convert clinical conversations into draft medical notes. These tools were promoted as improving workflow efficiency, physician well-being, and patient-centered care, yet adoption largely preceded empirical evidence of benefit. Systematic errors are a central concern. AI scribes are prone to hallucinated content, false inferences, and attribution errors that can persist in the medical record if clinicians do not consistently review and correct generated notes. Such inaccuracies could erode confidence in documentation and, in some cases, contribute to sentinel safety events through misrepresentation of diagnoses, medications, or clinical reasoning.
AI scribes fail to capture nuances of human communication relevant to clinical evaluation and documentation, including paralinguistic and pragmatic elements such as tone, facial expression, gesture, sarcasm, and metaphor. These deficiencies were particularly relevant in pediatrics, psychiatry, and encounters involving patients with nonnormative speech patterns, raising concerns about bias and ableism. The opinion also highlighted the problem of overcapture, in which AI scribes produced excessively detailed notes that obscured salient clinical information. As lead author Ursula M. Francis, JD, PhD, MSc, of the MacLean Center for Clinical Medical Ethics at the University of Chicago, and colleagues noted, "Adverse impacts of documenting sensitive information, like immigration status, often outweigh the benefits—but AI scribes might include such information reflexively."
Privacy and transparency were identified as additional ethical challenges. Cloud-based storage of full encounter recordings and the involvement of third-party contractors in transcription review expanded privacy risks beyond traditional documentation practices. Although these arrangements were often disclosed to institutions and clinicians, patients were rarely informed that nonclinical personnel could access their conversations. Consent processes, frequently embedded in boilerplate privacy notices or conveyed in high-acuity settings, were described as inadequate to ensure meaningful understanding and voluntariness.
AI scribes are marketed in the US as administrative tools and are largely exempt from medical device regulation, despite producing medical device–like output. The authors called for standardized performance metrics, independent reader studies, and clearer regulatory frameworks to guide evaluation and oversight of these technologies.
Full disclosures can be found in the published opinion article.
Source: Annals of Internal Medicine