false
Catalog
AANS Beyond 2021: Full Collection
Deep Neural networks Can Perform Automated Instrum ...
Deep Neural networks Can Perform Automated Instrument Detection in Endoscopic Skill Base Surgery
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
My name is Guillaume Couganer and I will be discussing my team's project, Deep Neural Networks Can Perform Automated Instrument Detection in Endoscopic Skull-Based Surgery. We are grateful to the AANS for awarding this project the Mizzouho Minimally Invasive Brain Tumor Surgery Award, and would like to thank Mizzouho for their support. The objective of this presentation will be to highlight computer vision as a tool for surgical research, describe how to develop an intraoperative video dataset and analytical pipeline, and demonstrate that intraoperative video is worth storing and analyzing. Endoscopic endonasal surgery is a common neurosurgical procedure providing access to the cranial base. One of the key steps of the procedure is identifying and preserving each internal carotid artery, the main arterial blood supply to each cerebral hemisphere. Injury to the ICA can result in stroke, neurologic injury, and or death. While this injury occurs in only 1% of surgical cases, as highlighted in red, over half of surgeons can expect to encounter this complication at least once in their career. However, many surgeons may not develop proficiency in hemorrhage control during their training programs or clinical practice, given its rarity. We developed and validated a high-fidelity simulator to teach surgeons to manage this rare complication. In this exercise, a cadaveric head specimen was perfused with an artificial blood substitute after an endonasal surgical approach was prepared. A deliberate injury to the ICA was made, and surgeons attempted to manage the injury twice during a simulation time of 5 minutes per simulated session, once before receiving any coaching, and once after expert coaching. Successful management of this injury involves first using the hover technique with the large pore suction, achieving initial hemostasis with hover and cover using a cotinoid, and finally achieving a durable hemostasis by switching to a muscle patch for a vascular patch. This simulation is complete with a mock OR environment, including vital signs. Trial success or failure and total blood loss are recorded at the end of each trial. Analysis of surgeon performance, including resident and attending surgeons, showed that there was a significant improvement in a surgeon's ability to manage the simulated injury from their first to second attempt. After completing a single trial, there were two main populations of participants, those that successfully achieved hemostasis versus those that did not, with these same splits existing within attending and resident subgroups. After receiving expert coaching, the large majority of participants are able to successfully complete this task. Participants also described this cadaveric simulation as having exceptionally high realism and transferability to the operating room, as highlighted in red. Over the course of running these simulations, we saved endoscopic video of participant trials. At the conclusion of this project, we found ourselves with close to 150 recorded trials. We felt that these videos could be used to see how participants performed, but we did not know how to perform this analysis. This is a relatively common problem in surgery. Surgeons record hundreds of hours worth of intraoperative video that ends up sitting on a hard drive somewhere, unused. Until recently, one of the main barriers to using this type of data was appropriate and accessible analytical methods that could process large video files. Computers were not fast enough and did not have the memory capacity to process images, let alone video data. However, with the development and availability of GPUs, computing is no longer an issue, and the increased computing power has catalyzed recent advancements in methods such as machine learning and deep neural networks, which has significantly lowered the barriers of entry to analyzing this data. Specifically, many surgical groups have become interested in applying computer vision methods to intraoperative video analysis. A recent Viewpoint article in Surgery highlighted some of the things that have been done, including surgical action recognition, surgical phase detection, and automated identification of surgical tools in view. They go so far as suggesting that AI may ultimately create that ultimate surgical mentor to help guide us through cases and create safer surgery for patients everywhere. One of the remaining challenges to analyzing surgical video is access to high-quality data that these methods can be applied to. For many of these deep learning methods, having access to some video or imaging data is not enough. Success is dependent on possession of large, clean, high-quality datasets with a consistent setup and labeling scheme. The data we had in hand uniquely positioned us to pursue this type of project. The dataset is large. We had 150 separate video trials with over 65 surgeons, 9 unique sites, 2 video resolutions, and variations in tool appearance. The videos we collected are consistent. All surgeons were tasked to perform the same task. While the lighting, resolution, and tools may vary, the anatomical environment was consistent. These videos had labels. This exercise involved the collection of outcome variables across all cohorts and participants. In addition, all recorded trials from this simulation were hand-annotated to label surgical tools in view using bounding boxes. This complete dataset is named Simulated Outcomes Following Carotid Artery Laceration and was used to train a deep learning object detection network to automatically identify tools from video. Having generated our dataset, we were now able to use it to develop a deep learning model. Deep neural networks are machine learning models inspired by the biological neural networks found in animal brains. They contain an input and output layer and many hidden layers in between, consisting of mathematical functions. Using large datasets and powerful computing hardware called GPUs, the coefficients, or weights, of these mathematical functions are optimized to learn how to perform a particular task. The models used in our work are based on a subset of deep neural networks called convolutional neural networks, which are particularly well suited at processing image and video-based data. We used two previously validated convolutional neural networks, RetinaNet and YOLOv3, and trained them to perform our particular task of interest, identifying surgical tools in surgical video. To develop the model, individual frames from video were fed into a model that uses a series of mathematical operations called convolutions to recognize spatial patterns within images. Through showing the model many examples of the different tool types used in a procedure, it can learn to associate particular patterns in a frame with a specific tool type, and eventually accurately identify where and what type of tool is being used in the video. Our results from running the model on video it had not seen before indicate that this approach is able to identify specific surgical tools like Session and Grasper quite reliably, whereas there is still much room for improvement on others. Having created a pipeline that can automatically detect tools in view, we tried to see how tool usage patterns affected blood loss during a trial. We pulled together the detections that our model made across frames to generate a signature of tool usage for a single video. We compared and contrasted these signatures across different trials to see if there were any interesting associations with outcomes. Indeed, we found that there was lots of information stored within surgical video. For this task specifically, we found that more time with Cotenoid, Suction, and Grasper in view generally meant less blood loss, whereas more time with no tools in view was associated with increased blood loss. While as surgeons, these observations may seem obvious, the importance is that a model with no surgical expertise or manual training relying on recognizing patterns within visual images was able to make these observations. These levels of associations are only scratching the surface of the power of this technique. To quickly recap what we have discussed so far, we created a publicly available annotated neurosurgical dataset with outcomes data, implemented a deep learning model to extract clinically relevant features from intraoperative video, and used video-based features to better understand differences in blood loss across trials. So where do we go from here? There are many potential applications of automated intraoperative video analysis, including virtual coaching to improve surgeon training, real-time anatomy detection to help surgeons avoid complications, and automated summarization of lung procedures to their key events associated with outcomes. However, there are several challenges that remain to be addressed. These include improving the accuracy and precision of existing models and methods, the creation of interpretable metrics, and the availability of intraoperative video from multiple institutions, covering various surgical techniques and pathologies. There is a tremendous need for more neurosurgeon involvement to address these challenges. Clinicians are in the best position to determine what types of questions can be asked and answered through the analysis of this video-based data. Specific questions our group has recently tried to address include, can we extrapolate these methods to real intraoperative video? Can we predict outcomes, such as blood loss and task success, directly using deep learning methods without the use of annotations? How does a model train from video compare to outcome prediction by trained surgeons on the same task? In addition, endoscopic endonasal is not the only type of surgery with recording capabilities. We are working on applying our approach to other surgical settings within neurosurgery, such as minimally invasive spine. These are just a few of the things that our group is interested in at the moment, but far from an exhaustive list of the potential of this approach within neurosurgery. Hopefully, this talk has gotten people to start thinking about what clinical data they may be sitting on that could be used in this way, or what types of things they could learn from analyzing their own operative video. Thanks to the team and everyone involved to make this all happen. The data used in this project is available on Figshare for download now. We hope that it can be of use in your own projects.
Video Summary
In this video, Guillaume Couganer discusses his team's project on automated instrument detection in endoscopic skull-based surgery using deep neural networks. The presentation aims to highlight the use of computer vision in surgical research. He explains the development of an intraoperative video dataset and analytical pipeline, emphasizing the importance of preserving the internal carotid artery in endonasal surgery. The team developed a simulator to train surgeons in managing this complication, recording over 150 trials. They used deep learning models to automatically identify surgical tools in the videos and analyze tool usage patterns in relation to blood loss. The speaker also discusses future applications and challenges in intraoperative video analysis. The dataset used in the project is publicly available for download.
Keywords
automated instrument detection
endoscopic skull-based surgery
deep neural networks
computer vision
surgical research
×
Please select your language
1
English