Quality item | Comment |
---|---|
Internal validity | Â |
Sample cohort | Prospectively collected data are of greater quality than retrospectively collected data and are preferred for model development [14]. |
Loss to follow up | Loss to follow up is common. Investigators should state the number of patients lost (or else the completeness of follow-up [15] which takes into account the duration of follow-up) along with reasons/explanations. An arbitrary proportion thought adequate for analysis is 90% complete follow-up [7]. |
Predictive/outcome variables | Predictors and outcomes/follow-up time should be explicitly defined: otherwise invalid predictions may be produced. |
Missing values | A transparent summary of missing data and the methods used to handle them should be provided. Complete-case analysis should be avoided in favor of multiple imputation methods [16, 17]. A general rule of thumb suggests that imputation should be considered if the proportion of missingness exceeds 5% of the data [18]. |
Statistical validity | Â |
Model building strategy | A priori clinical knowledge should be used to inform selection of risk factors. Data driven predictor selection (for example, stepwise selection) should be avoided where possible [19, 20]. |
Handling of continuous variables | Arbitrary categorization should be avoided [21]. Defined cut-points must be based on clinical reasoning. |
Sample size | The sample size used in derivation (derivation sample) must be reported along with a sufficient description of baseline characteristics. The number of patients with the outcome event in follow-up (effective sample size) should be reported: 10 events per fitted parameter is often used as a minimum number [22]. |
Model evaluation | Â |
Evaluation | Internal validation techniques (for example, bootstrap sampling or cross-validation) provide a minimum check of overfitting and optimism. External evaluation in new data is the most rigorous assessment of model generalizability. |
Description of external cohort | A description of the baseline characteristics should be reported to enable a comparison of the validation cohort to the development cohort. |
Discrimination and calibration | Discrimination metrics should be provided, for example, the area under the receiver operating characteristic curve (AUROCC). Model calibration should be studied using a calibration plot with estimated slope and intercept provided. |