Score items | Grade | Specific evaluation criteria | References |
---|---|---|---|
Transparency of algorithms | I | Post the trained models that can be directly loaded by other researchers for a contiguous independent validation or online/mobile user-friendly calculators that can allow batch processing of participant information (e.g., a prediction software or tool) | ∙ APPRAISE-AI [31] ∙ MI-CLAIM [32] ∙ AI-TREE [33] |
II | Apply and report the classic algorithms that can be found in some common tools/platforms OR report complete codes and hyperparameters and required description, allowing independent researchers to run the pipeline end to end | ||
III | Report formulas and/or incomplete hyperparameters without required description, leading to difficulties in replication or incomplete reproducibility | ||
IV | Incomplete reports that cannot be used for reproduction | ||
Performance of models | I | At least report the discrimination (preferably c-index) and calibration (preferably calibration plot/table) of the model, and the performance index version is clearly reported and index is excellent (e.g., 0.9 < c-index < = 1.0; calibration intercept close to 0 and calibration slope close to 1) | TRIPOD [34] ∙ CHARMS checklist [35] ∙ Official statement [36] ∙ AI-TREE [33] ∙ Expert comment [37] |
II | At least report the discrimination (preferably c-index) and calibration (preferably calibration plot/table) of the model, and the performance index version is clearly reported and index is good (e.g., 0.7 < c-index < = 0.9; calibration intercept deviates moderately from 0, and calibration slope deviates moderately from 1) | ||
III | Do not report the discrimination or calibration of the models; OR the performance index version is not clearly reported; OR the value of the index is unknown | ||
IV | The model performance is at a low accuracy (e.g., c-index < = 0.7; calibration intercept deviates severely from 0 and calibration slope deviates severely from 1) | ||
Feasibility of reproduction | I | The office-based models without requirement for laboratory and inspection data (also known as non-laboratory models) | ∙ Validation and evaluation framework [38] ∙ AI standardization [39] ∙ AI-TREE [33] ∙ MI-CLAIM [32] ∙ CONSORT-AI [40] ∙ MAIC-10 [41] ∙ SR of validity and clinical utility [11] ∙ WHO laboratory-based and non-laboratory models [42] ∙ Laboratory-based and non-laboratory models [43] |
II | The laboratory-based models only requiring routine clinical structured data, which are easy to obtain and do not need secondary operation (e.g., image pre-processing or annotation, etc.) | ||
III | Include data derived from unconventional laboratory and inspection, complex gene-related testing, tissue specimen, and other resource-limiting extensive applications, which are hard to obtain or require secondary operation (e.g., labeling) | ||
IV | Do not report the variables | ||
Risk of reproduction | I | No domain high risk (evaluated by using PROBAST) | ∙ PROBAST [30] |
II | Only one domain is high risk (evaluated by using PROBAST) | ||
III | Two domains are high risk (evaluated by using PROBAST) | ||
IV | Over two domains are high risk (evaluated by using PROBAST) | ||
Clinical implication | I | Identified novel risk markers or novel risk standards, which will optimize existing clinical preventive strategies and contribute to patient benefit for the general population and major CVDs, similar to classical T-Ms (e.g., Framingham Score) | ∙ SR of T-Ms [29] ∙ Biomedical research AI guideline [44] ∙ BS30440 [45] ∙ APPRAISE-AI [31] ∙ Consolidated AI reporting guideline [46] ∙ AI-TREE [33] ∙ SR of validity and clinical utility [11] |
II | Do not identify novel risk markers or novel risk standards, but enhance the predictive capacity beyond that of existing methods, which may optimize existing clinical preventive measures or offer additional benefits for the non-rare population and non-rare subset of CVDs (more than 1/2000 of the general population) | ||
III | Only enhance the predictive capacity beyond that of existing methods, but cannot alter the existing preventive interventions or provide additional benefits for the non-rare population and non-rare subset of CVDs (more than 1/2000 of the general population) | ||
IV | Do not enhance the predictive performance beyond that of existing methods OR only target a rare population or subset of CVDs (fewer than 1/2000 of the general population, e.g., infiltrative cardiac diseases), leading to inadequate validation and a lack of clinical utility for a broader population |