Validation of an Algorithm to Identify Patients at Risk for Colorectal CancerBased on Laboratory Test and Demographic Data in Diverse, Community-Based Population

Share this article:

Share your details and we'll email you our Publication

Authors:

Jennifer L. Schneider

Evan Layefsky

Natalia Udaltsova

Yaron Kinar

Theodore R. Levin

Douglas A. Corley

Abstract

Background & Aims

Approximately 30%–40% of screening-eligible adults in the United States are not up to date
with colorectal cancer (CRC) screening. We aimed to validate a predictive score, generated by a
machine learning algorithm with common laboratory test data, to identify patients at high risk
for CRC in a large, community-based, ethnically diverse cohort.

Methods

We performed a nested case–control study using data from members of Kaiser Permanente
Northern California (1996–2015). Cases were cohort members who received a complete blood
cell count at ages 50—75 y, did not have a prior or current diagnosis of CRC diagnosis at the
time of the blood cell count, and were subsequently diagnosed with CRC. We used data from the
cohort to validate the ability of an algorithm that uses laboratory and demographic information
to identify patients at increased risk for CRC. Test performance was evaluated using area under
the receiver operating characteristic curve (AUROC) and odds ratios (OR) with 95% CI values to
compare high (defined as 97% specificity or more) vs low scores.

Results

A high score from the algorithm identified patients with a CRC diagnosis within the next 6
months with 35.4% sensitivity (95% CI, 33.8–36.7) and an AUROC of 0.78 (95% CI, 0.77–0.78).
Patients with a high score had an increased risk of diagnosis with early-stage CRC (OR, 13.1;
95% CI, 11.8–14.3) and advanced stage CRC (OR, 24.8; 95% CI, 22.4–27.3) within the next 6
months. In patients with high scores, the ORs for proximal and distal cancers were 34.7 (95%
CI, 31.5–37.7) and 12.1 (95% CI, 10.1–13.9), respectively. The algorithm’s accuracy decreased
with the time interval between blood test result and CRC diagnosis; performance did not differ
by sex or race.

Conclusions

We validated a predictive model that uses complete blood cell count and demographic data to
identify patients at high risk of CRC. The algorithm identified 3% of the population who require
an investigation and identified 35% of patients who received a diagnosis of CRC within the next
6 months.