Abstract: On the importance of including unconfirmed cases when assessing the effect of AI on the recall rate in breast cancer screening

Presented at ECR 2022

Purpose

To demonstrate the importance of using representative data, including unconfirmed cases (neither positive nor negative) to assess recall rate (RR) and avoid obscuring real-world performance. 

Method

The performance of a commercially available AI system was evaluated in a large-scale retrospective study of unenriched representative real-world data (275,900 cases) from seven sites, two countries (UK, Hungary), and four device vendors (Hologic, GE, Siemens, IMS Giotto) from 2009-2019. Positives were pathology-proven malignancies. Negatives had 3-year negative follow-up results. Recall rate was assessed in two ways: 1) using the unenriched representative dataset, including positives, negatives, and unconfirmed cases, and 2) after removing unconfirmed cases and artificially scaling up negatives to reconstruct the screening cancer prevalence found in 1.

Results

The representative dataset included 74.6% unconfirmed cases. Cancer prevalence in method 1 versus 2 (pre-scaling) were 1.0% and 4.5%, respectively. The AI’s standalone RR in method 1 versus 2 (post-scaling) was 11.5% and 9.5%, respectively, demonstrating an apparent 17.6% relative reduction when using a constructed dataset.

Conclusion

For the assessment of AI performance on RR, it is important to include unconfirmed cases which are likely to be more difficult for AI to assess correctly. Validating AI on non-representative, constructed datasets, excluding unconfirmed cases, may otherwise show optimistically low RR which would not translate to screening practice. Subsequent implementation would pose a significant risk for overdiagnosing patients, leading to unnecessary use of resources and unnecessary patient anxiety. Studies should use unbiased metrics with minimal truthing requirements such as RR and representative, real-world data, avoiding artificial construction, to assess AI performance in breast cancer screening.

Limitation

Results may not be representative for other AI systems.

Ethics committee approval

UK HRA (REC reference: 19/HRA/0376) and ETT-TUKEB (Hungary) approval (Reg no: OGYÉI/46651-4/2020).

Funding for this study

Kheiron Medical Technologies