TY - JOUR
T1 - Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark
AU - Wagner, Martin
AU - Müller-Stich, Beat Peter
AU - Kisilenko, Anna
AU - Tran, Duc
AU - Heger, Patrick
AU - Mündermann, Lars
AU - Lubotsky, David M.
AU - Müller, Benjamin
AU - Davitashvili, Tornike
AU - Capek, Manuela
AU - Reinke, Annika
AU - Reid, Carissa
AU - Yu, Tong
AU - Vardazaryan, Armine
AU - Nwoye, Chinedu Innocent
AU - Padoy, Nicolas
AU - Liu, Xinyang
AU - Lee, Eung Joo
AU - Disch, Constantin
AU - Meine, Hans
AU - Xia, Tong
AU - Jia, Fucang
AU - Kondo, Satoshi
AU - Reiter, Wolfgang
AU - Jin, Yueming
AU - Long, Yonghao
AU - Jiang, Meirui
AU - Dou, Qi
AU - Heng, Pheng Ann
AU - Twick, Isabell
AU - Kirtac, Kadir
AU - Hosgor, Enes
AU - Bolmgren, Jon Lindström
AU - Stenzel, Michael
AU - von Siemens, Björn
AU - Zhao, Long
AU - Ge, Zhenxiao
AU - Sun, Haiming
AU - Xie, Di
AU - Guo, Mengqi
AU - Liu, Daochang
AU - Kenngott, Hannes G.
AU - Nickel, Felix
AU - Frankenberg, Moritz von
AU - Mathis-Ullrich, Franziska
AU - Kopp-Schneider, Annette
AU - Maier-Hein, Lena
AU - Speidel, Stefanie
AU - Bodenstedt, Sebastian
N1 - Publisher Copyright:
© 2023
PY - 2023/5
Y1 - 2023/5
N2 - Purpose: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. Methods: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. Results: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). Conclusion: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.
AB - Purpose: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. Methods: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. Results: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). Conclusion: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery.
KW - Endoscopic vision
KW - Laparoscopic cholecystectomy
KW - Surgical data science
KW - Surgical workflow analysis
UR - https://www.scopus.com/pages/publications/85150887760
U2 - 10.1016/j.media.2023.102770
DO - 10.1016/j.media.2023.102770
M3 - Article
C2 - 36889206
AN - SCOPUS:85150887760
SN - 1361-8415
VL - 86
JO - Medical Image Analysis
JF - Medical Image Analysis
M1 - 102770
ER -