Interpretable artificial intelligence-based app assists inexperienced radiologists in diagnosing biliary atresia from sonographic gallbladder images

Table 3 The diagnostic performance of senior radiologists alone, smartphone app alone, and senior radiologists with smartphone app’s assistance in diagnosing biliary atresia

Radiologist^a	AUC	Sensitivity (%)	Specificity (%)	Accuracy (%)	P1 value^#	P2 value^#	Not gallbladder
J-initial	0.816 (0.770, 0.856)	88.9 (83.5, 93.0)	74.3 (66.4, 81.2)	82.6 (73.1, 92.9)	0.01	0.18	0
Model	0.874 (0.834, 0.908)	91.5 (86.6, 95.1)	83.3 (76.2, 89.0)	88.0 (78.2, 98.7)	–	–	–
J-final	0.841 (0.797, 0.879)	88.4 (82.9, 92.6)	79.9 (72.4, 86.1)	84.7 (78.1, 95.2)	–	–	–
K-initial	0.714 (0.662, 0.762)	94.2 (89.8, 97.1)	48.6 (40.2, 57.1)	66.1 (57.6, 75.4)	< 0.001	0.01	2
Model	0.859 (0.817, 0.895)	90.0 (84.7, 93.8)	81.9 (74.7, 87.9)	79.3 (70.0, 89.4)	–	–	–
K-final	0.753 (0.703, 0.798)	93.7 (89.2, 96.7)	56.9 (48.4, 65.2)	81.4 (72.0, 91.7)	–	–	–
L-initial	0.670 (0.616, 0.720)	60.3 (53.0, 67.3)	73.6 (65.6, 80.6)	74.5 (65.5, 84.3)	< 0.001	< 0.001	8
Model	0.785 (0.737, 0.828)	84.1 (78.1, 89.0)	72.9 (64.9, 80.0)	86.5 (76.8, 97.1)	–	–	–
L-final	0.810 (0.763, 0.850)	84.1 (78.1, 89.0)	77.8 (70.1, 84.3)	77.8 (68.6, 87.9)	–	–	–
M-initial	0.731 (0.680, 0.778)	58.7 (51.4, 65.8)	87.5 (81.0, 92.4)	71.2 (62.4, 80.8)	0.001	0.002	4
Model	0.818 (0.773, 0.858)	87.3 (81.7, 91.7)	76.4 (68.6, 83.1)	82.6 (73.1, 92.9)	–	–	–
M-final	0.791 (0.744, 0.834)	71.4 (64.4, 77.8)	86.8 (80.2, 91.9)	78.1 (68.9, 88.2)	–	–	–
N-initial	0.812 (0.766, 0.853)	88.9 (83.5, 93.0)	73.6 (65.6, 80.6)	82.3 (72.8, 92.7)	0.91	0.08	3
Model	0.810 (0.764, 0.851)	90.5 (85.4, 94.3)	71.5 (63.4, 78.7)	82.3 (72.8, 92.7)	–		–
N-final	0.830 (0.785, 0.869)	91.0 (86.0, 94.7)	75.0 (67.1, 81.8)	84.1 (74.5, 94.5)	–		–

95% confidence intervals are included in brackets
^aFive senior radiologists were labeled “J” to “N”. “-initial” represents the diagnosis of the radiologist alone; “Model” represents the diagnosis of the smartphone app tested with the photos taken by the relevant radiologist; “-final” represents the diagnosis of the radiologist with smartphone app’s assistance
^#The P1 values were from the comparison between the AUC of the radiologists alone and the AUCs of the smartphone app alone. The P2 values were from the comparison between the AUC of the radiologists alone and the AUCs of smartphone app-assisted radiologists. Differences between various AUCs were compared using a Delong test

ISSN: 1741-7015