Skip to main content

Table 2 The diagnostic performance of junior radiologists alone, smartphone app alone, and junior radiologists with smartphone app’s assistance in diagnosing biliary atresia

From: Interpretable artificial intelligence-based app assists inexperienced radiologists in diagnosing biliary atresia from sonographic gallbladder images

Radiologista

AUC

Sensitivity (%)

Specificity (%)

Accuracy (%)

P1 value#

P2 value#

Not gallbladder

A-initial

0.798 (0.751, 0.840)

78.3 (71.7, 84.0)

81.3 (73.9, 87.3)

79.6 (70.3, 89.8)

0.11

0.01

3

Model

0.838 (0.794, 0.876)

87.8 (82.3, 92.1)

79.9 (72.4, 86.1)

84.4 (74.8, 94.9)

A-final

0.863 (0.821, 0.898)

90.0 (84.7, 93.8)

82.6 (75.4, 88.4)

86.8 (77.1, 97.4)

B-initial

0.784 (0.735, 0.827)

85.2 (79.3, 89.9)

71.5 (63.4, 78.7)

79.3 (70.0, 89.4)

0.02

0.09

9

Model

0.841 (0.797, 0.879)

88.4 (82.9, 92.6)

79.9 (72.4, 86.1)

84.7 (75.1, 95.2)

B-final

0.819 (0.773, 0.859)

89.4 (84.1, 93.4)

74.3 (66.4, 81.2)

82.9 (73.4, 93.3)

C-initial

0.853 (0.811, 0.890)

89.4 (84.1, 93.4)

81.3 (73.9, 87.3)

85.9 (76.2, 96.4)

0.66

0.87

5

Model

0.850 (0.807, 0.887)

91.5 (86.6, 95.1)

78.5 (70.9, 84.9)

85.9 (76.2, 96.4)

C-final

0.860 (0.818, 0.896)

86.4 (81.0, 90.8)

87.4 (80.3, 92.6)

86.5 (76.8, 97.1)

D-initial

0.725 (0.674, 0.772)

80.4 (74.0, 85.8)

64.6 (56.2, 72.4)

73.6 (64.7, 83.4)

 < 0.001

 < 0.001

7

Model

0.849 (0.806, 0.886)

92.1 (87.2, 95.5)

77. 8 (70.1, 84.3)

85.9 (76.2, 96.4)

D-final

0.810 (0.764, 0.851)

88.4 (82.9, 92.6)

73.6 (65.6, 80.6)

82.0 (72.5, 92.3)

E-initial

0.777 (0.728, 0.821)

89.4 (84.1, 93.4)

66.0 (57.6, 73.7)

79.3 (70.0, 89.4)

0.12

0.06

8

Model

0.820 (0.775, 0.860)

88.4 (82.9, 92.6)

75.7 (67.9, 82.4)

82.9 (73.4, 93.3)

E-final

0.828 (0.783, 0.867)

90.0 (84.7, 93.8)

75.7 (67.9, 82.4)

83.8 (74.2, 94.2)

F-initial

0.821 (0.776, 0.861)

92.1 (87.2, 95.5)

72.2 (64.2, 79.4)

83.5 (74.0, 93.9)

0.02

0.003

8

Model

0.866 (0.824, 0.900)

90.5 (85.4, 94.3)

82.6 (75.4, 88.4)

87.1 (77.4, 97.7)

F-final

0.876 (0.836, 0.909)

90.5 (85.4, 94.3)

84.7 (77.8, 90.2)

88.0 (78.2, 98.7)

G-initial

0.769 (0.720, 0.813)

85.7 (79.9, 90.4)

68.1 (59.8, 75.6)

78.1 (68.9, 88.2)

0.01

0.01

7

Model

0.835 (0.791, 0.873)

87.8 (82.3, 92.1)

79.2 (71.6, 85.5)

84.1 (74.5, 94.5)

G-final

0.806 (0.760, 0.847)

88.4 (82.9, 92.6)

72.9 (64.9, 80.0)

81.7 (72.3, 92.0)

H-initial

0.743 (0.693, 0.789)

77. 8 (71.2, 83.5)

70.8 (62.7, 78.1)

74.8 (65.8, 84.7)

0.01

 < 0.001

13

Model

0.817 (0.771, 0.857)

86.2 (80.5, 90.8)

77.1 (69.3, 83.7)

82.3 (72.8, 92.7)

H-final

0.820 (0.775, 0.860)

86.2 (80.5, 90.8)

77. 8 (70.1, 84.3)

82.6 (73.1, 92.9)

I-initial

0.686 (0.633, 0.735)

63.5 (56.2, 70.4)

73.6 (65.6, 80.6)

67.9 (59.3, 77.3)

 < 0.001

 < 0.001

21

Model

0.830 (0.785, 0.868)

86.8 (81.1, 91.3)

79.2 (71.6, 85.5)

83.5 (74.0, 93.9)

I-final

0.829 (0.784, 0.868)

87.3 (81.7, 91.7)

78.5 (70.9, 84.9)

83.5 (74.0, 93.9)

  1. 95% confidence intervals are included in brackets
  2. aNine junior radiologists were labeled “A” to “I”. “-initial” represents the diagnosis of the radiologist alone; “Model” represents the diagnosis of the smartphone app tested with the photos taken by the relevant radiologist; “-final” represents the diagnosis of the radiologist with smartphone app’s assistance
  3. #The P1 values were from the comparison between the AUC of the radiologists alone and the AUCs of the smartphone app alone. The P2 values were from the comparison between the AUC of the radiologists alone and the AUCs of smartphone app-assisted radiologists. Differences between various AUCs were compared using a Delong test