To demonstrate the importance of using representative data, including unconfirmed cases (neither positive nor negative) to assess recall rate (RR) and avoid obscuring real-world performance.
The performance of a commercially available AI system was evaluated in a large-scale retrospective study of unenriched representative real-world data (275,900 cases) from seven sites, two countries (UK, Hungary), and four device vendors (Hologic, GE, Siemens, IMS Giotto) from 2009-2019. Positives were pathology-proven malignancies. Negatives had 3-year negative follow-up results. Recall rate was assessed in two ways: 1) using the unenriched representative dataset, including positives, negatives, and unconfirmed cases, and 2) after removing unconfirmed cases and artificially scaling up negatives to reconstruct the screening cancer prevalence found in 1.
The representative dataset included 74.6% unconfirmed cases. Cancer prevalence in method 1 versus 2 (pre-scaling) were 1.0% and 4.5%, respectively. The AI’s standalone RR in method 1 versus 2 (post-scaling) was 11.5% and 9.5%, respectively, demonstrating an apparent 17.6% relative reduction when using a constructed dataset.
For the assessment of AI performance on RR, it is important to include unconfirmed cases which are likely to be more difficult for AI to assess correctly. Validating AI on non-representative, constructed datasets, excluding unconfirmed cases, may otherwise show optimistically low RR which would not translate to screening practice. Subsequent implementation would pose a significant risk for overdiagnosing patients, leading to unnecessary use of resources and unnecessary patient anxiety. Studies should use unbiased metrics with minimal truthing requirements such as RR and representative, real-world data, avoiding artificial construction, to assess AI performance in breast cancer screening.
Results may not be representative for other AI systems.
Ethics committee approval
UK HRA (REC reference: 19/HRA/0376) and ETT-TUKEB (Hungary) approval (Reg no: OGYÉI/46651-4/2020).
Funding for this study
Kheiron Medical Technologies