Benchmarking saliency methods for chest X-ray interpretation

Nguyen, Do Trung Chanh; Saporta, Adriel; Agrawal, Ashwin; Pareek, Anuj; Truong, Steven Q. H.; Ngo, Van Doan; Seekins, Jayne; Blankenberg, Francis G.; Ng, Andrew Y.; Lungren, Matthew P.; Rajpurkar, Pranav

dc.contributor.author	Nguyen, Do Trung Chanh
dc.contributor.author	Saporta, Adriel
dc.contributor.author	Agrawal, Ashwin
dc.contributor.author	Pareek, Anuj
dc.contributor.author	Truong, Steven Q. H.
dc.contributor.author	Ngo, Van Doan
dc.contributor.author	Seekins, Jayne
dc.contributor.author	Blankenberg, Francis G.
dc.contributor.author	Ng, Andrew Y.
dc.contributor.author	Lungren, Matthew P.
dc.contributor.author	Rajpurkar, Pranav
dc.date.accessioned	2024-06-10T04:48:16Z
dc.date.available	2024-06-10T04:48:16Z
dc.date.issued	2022-08
dc.identifier.uri	https://vinspace.edu.vn/handle/VIN/79
dc.description.abstract	Saliency methods, which produce heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. However, rigorous investigation of the accuracy and reliability of these strategies is necessary before they are integrated into the clinical setting. In this work, we quantitatively evaluate seven saliency methods, including Grad-CAM, across multiple neural network architectures using two evaluation metrics. We establish the first human benchmark for chest X-ray segmentation in a multilabel classification set-up, and examine under what clinical conditions saliency maps might be more prone to failure in localizing important pathologies compared with a human expert benchmark. We find that (1) while Grad-CAM generally localized pathologies better than the other evaluated saliency methods, all seven performed significantly worse compared with the human benchmark, (2) the gap in localization performance between Grad-CAM and the human benchmark was largest for pathologies that were smaller in size and had shapes that were more complex, and (3) model confidence was positively correlated with Grad-CAM localization performance. Our work demonstrates that several important limitations of saliency methods must be addressed before we can rely on them for deep learning explainability in medical imaging.	en_US
dc.language.iso	en_US	en_US
dc.title	Benchmarking saliency methods for chest X-ray interpretation	en_US
dc.type	Article	en_US

Files in this item

Name:: Benchmarking saliency methods ...
Size:: 4.137Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Nguyen Do Trung Chanh, PhD [11]
Model Development Manager - College of Engineering and Computer Science

Show simple item record