
Once upon a time, the operative report was a surgeon’s record of what happened in the OR—a document dictating billing and could make or break a malpractice case. Like all human recollections, especially those dictated hours (or days) later, errors creep in. Enter AI video review, a new hero swooping in to fix documentation mistakes… replacing the last hero, the operative report template, which was supposed to fix the same problem. It's funny how technology always finds a way to make itself indispensable—at an additional cost.
“Operative reports are the only window into what transpires inside the operating room during surgery. They are the foundation for surgical documentation, and are responsible for capturing the “what,” “why,” and “how” of surgery.”
As a document vital for billing and malpractice litigation, it is not unreasonable to consider it the most important document generated by the operating surgeon. [1] Like all human endeavors, it is prone to error, especially when dictated many hours or days after the actual surgery. Like a white knight, AI video review is being championed as a solution to reduce the errors in these reports. As we will ironically discover, it is being offered in place of a previous white knight, the operative report template where a surgeon could edit a standard report that covered all operative eventualities.
The study in the Journal of American College of Surgeons (JACS) looked at operative reports on robotic-assisted radical prostatectomy (RARP)at an academic medical center over a year. This operation, which removes a cancerous prostate and is more likely to protect a patient’s sexual ability and urinary continence than the traditional open version, was chosen for two reasons. Critically, RARP is a procedure performed on video screens that captures everything the surgeon sees in the operative field. As a result, there is no need for external cameras or other monitoring that might miss key steps. Second, the researchers had already developed an algorithm to generate operative reports based on RARP video footage.
For the study, the entire video was reviewed by humans who determined whether the procedure's key steps [2] were or were not performed. In the parlance, their assessment represented the “ground truth.” Discrepancies in either the human or AI-generated operative reports were classified as either clinically significant or not, with clinical significance being
- An omission of key steps in a report, present in the video
- Inclusion of key steps in a report not present in the video
- An error with a meaningful difference in technique
Accuracy was the proportion of operative reports without a clinically significant discrepancy.
As might be anticipated, AI reported more accurately than the surgeon author. The error rates reported refer to clinically significant errors. Of course, clinical significance is, like the devil, in the details.
The errors in the surgeon’s report described a step within the procedure that did not occur. The AI tended to err in the opposite direction, omitting a performed step. Interestingly, while human error was greater, AI and humans had the greatest number of mistakes over the same step. [3]
Coding for these procedures is a bit of art and documentation. For those cynical enough to believe physicians frequently “up code” a procedure to financial advantage, taking a page from the Medicare Advantage playbook, this is not the case. I failed to find any evidence that adding or subtracting these steps resulted in changes in payment. What, then, was the source of physician error? The researchers offer this.
“We suspect that inaccuracies related to these nuances in surgical technique are unintended consequences of surgeons relying on note templates for surgical documentation.”
Translated, the templates developed by software companies, in conjunction with physicians, to improve operative reports may increase errors by allowing boilerplate descriptions to fade into the background of the day-to-day. Do not worry; the same automation behind the templates is now poised, through video review, to correct them – at some additional cost to be determined. Ever the techno-optimists, the researchers conclude that it was the algorithm’s fault, parenthetically not the authors of that algorithm, and that the lower rate of errors “highlight the promise of AI-generated operative reports.”
That said, I agree that operative reports are vital documentation and that any means to improve their accuracy is a step in the right direction. There is no doubt that templates result in significantly diminished times for completion and verification of an operative report. Here is the finding of one study from 2005,
“Templates resulted in dramatically faster times to the presence of a verified operative report in the medical record compared to dictation services (mean 28 v. 22,440 minutes). Templates increased overall compliance with national standards for operative note documentation and avoided transcription costs.”
Nary a word about accuracy. And Mayo Clinic, presumably the same institution that generated the current study, has also been using software’s latest shiny object, ChatGPT. In a study involving “simple” plastic surgical procedures, ChatGPT was the clear efficiency winner over dictation.
“The average time taken by human surgeons to create a note was 7.10 minutes .... By contrast, the ChatGPT platform generated notes in an average of 5.1 seconds, with … an average of 2.1 edits were needed per note, and 100% of the notes generated by the ChatGPT adhered to the guidelines. This indicated that the ChatGPT platform is considerably faster in generating operation notes when compared with humans, with greater accuracy.”
Guidelines referred to what was to be contained in the operative report, and “greater accuracy” was never defined or evaluated. There was no human or video observer to identify “ground truth.”
You can see a video of the prompt and output here.
Our current researchers also offer this bit of techno-optimism.
“In addition to being inherently subjective, the manual creation of operative reports is a tedious clerical task that contributes to the increasing administrative burden placed on physicians. Indeed, recent literature suggests medical documentation to be a primary driver of surgeon burnout.”
While indicating that the impact of burnout by this AI automation is unknown, we have a wealth of medical literature suggesting that the last software gift to medicine, the electronic health record, is responsible for physician burnout and an all-eyes-on-the-screen physician “encounter” that obliterates any hope for a real physician-patient relationship which is the true hear of medical care, rather than health services. Maybe instead of layering automation on top of automation, we should reconsider what we’re actually trying to fix.
[1] To be correct, a surgical trainee often generates the document but is “overread” and, more importantly, signed by the attending surgeon. So, for our purposes, we are discussing the work of the surgeon responsible for the patient’s surgery.
[2] “Pre-defined surgical steps included pelvic lymph node dissection, Space of Retzius dissection, dorsal vein ligation, anterior bladder neck transection, posterior bladder neck transection, seminal vesicle and posterior dissection, lateral/pedicle and apical dissection, urethral transection, vesicourethral anastomosis, and final inspection/extraction.”
[3] The step involves ligating (tying) a vascular area that theoretically reduces intra-operative blood loss. Studies of the modified technique show no essential difference in blood loss and an earlier return of continence, making this an accepted alternative.
Source: Enhancing Accuracy of Operative Reports with Automated Artificial Intelligence Analysis of Surgical Video Journal American College of Surgeons DOI: 10.1097/XCS.0000000000001352