Validation of an Algorithm to Identify Patients at Risk for Colorectal Cancer Based on Laboratory Test and Demographic Data in Diverse, Community-Based Population

Share your details and we'll email you our Publication

Authors

Jennifer L. Schneider

Evan Layefsky

Natalia Udaltsova

Theodore R. Levin

Douglas A. Corley

Division of Research, Kaiser Permanente Northern California, Oakland, California

BACKGROUND & AIMS

Approximately 30%–40% of screening-eligible adults in the United States are not up to date with colorectal cancer (CRC) screening. We aimed to validate a predictive score, generated by a machine learning algorithm with common laboratory test data, to identify patients at high risk for CRC in a large, community-based, ethnically diverse cohort.

METHODS

We performed a nested case–control study using data from members of Kaiser Permanente Northern California (1996–2015). Cases were cohort members who received a complete blood cell count at ages 50—75 y, did not have a prior or current diagnosis of CRC diagnosis at the time of the blood cell count, and were subsequently diagnosed with CRC. We used data from the cohort to validate the ability of an algorithm that uses laboratory and demographic information to identify patients at increased risk for CRC. Test performance was evaluated using area under the receiver operating characteristic curve (AUROC) and odds ratios (OR) with 95% CI values to compare high (defined as 97% specificity or more) vs low scores.

RESULTS

A high score from the algorithm identified patients with a CRC diagnosis within the next 6 months with 35.4% sensitivity (95% CI, 33.8–36.7) and an AUROC of 0.78 (95% CI, 0.77–0.78). Patients with a high score had an increased risk of diagnosis with early-stage CRC (OR, 13.1; 95% CI, 11.8–14.3) and advanced stage CRC (OR, 24.8; 95% CI, 22.4–27.3) within the next 6 months. In patients with high scores, the ORs for proximal and distal cancers were 34.7 (95% CI, 31.5–37.7) and 12.1 (95% CI, 10.1–13.9), respectively. The algorithm’s accuracy decreased with the time interval between blood test result and CRC diagnosis; performance did not differ by sex or race.

CONCLUSIONS

We validated a predictive model that uses complete blood cell count and demographic data to identify patients at high risk of CRC. The algorithm identified 3% of the population who require an investigation and identified 35% of patients who received a diagnosis of CRC within the next 6 months.

Keywords: Prognostic Factor; Early Detection; Stratification; Hematology.

See editorial on page 2683.

Colorectal cancer (CRC) is the second leading cause of cancer death in United States.1 An estimated
1 in 22 men and 1 in 24 women will be diagnosed with CRC at some point in their lives, and 1.4 million Americans are currently living with a personal history of CRC.2,3 The American Cancer Society estimates that in 2017, there were 135,460 incident cases of CRC and 50,260 deaths in the US1 CRC-associated deaths accounted for approximately 8.3% of all cancer mortality.4

Decreasing mortality from CRC is largely dependent on the removal of precancerous polyps and the detection of early, treatable cancers, which are associated with more favorable health outcomes.5-9 The United States Preventative Services Task Force recommends screening for CRC using techniques such as fecal testing, sigmoid- oscopy, or colonoscopy, beginning when adults reach the

Abbreviations used in this paper: AUC, area under the curve; BMI, body mass index; CBC, complete blood cell count; CI, confidence interval; CRC, colorectal cancer; KPNC, Kaiser Permanente Northern California; OR, odds ratio; PPV, positive predictive value; ROC, receiver operating char- acteristic; SEER, Surveillance, Epidemiology, and End Results Program.

© 2020 by the AGA Institute 1542-3565/$36.00
https://doi.org/10.1016/j.cgh.2020.04.054

age of 50 and continuing until they reach the age of 75.10 There has been extensive research to support these recommendations; systematic reviews and meta- analyses indicate significant reductions in CRC-specific mortality associated with fecal testing strategies, colo- noscopy, and sigmoidoscopy.7,11,12 Patients diagnosed with early stage CRC have a 5-year survival rate of 90% compared with 14% among those with late stage dis- ease.13 However, only 39% of patients are diagnosed with CRC when it is still in a localized stage.13

There is a need for simple tests that require minimal patient compliance to identify unscreened patients at high risk for CRC to target for additional outreach. The American Cancer Society estimates that only w60% of US adults older than the age of 50 are up to date with CRC screening by colonoscopy, sigmoidoscopy, or fecal testing.2 A simple test could also be used in settings with a limited colonoscopy capacity or to identify persons at high risk for postcolonoscopy cancers during the long durations between colonoscopy screening exams. Risk scoring systems using factors such as age, alcohol/ smoking status, and body mass index (BMI) have been developed, but these require active data acquisition and calculation by patients or providers.14 A potential inex- pensive and widely available strategy is through elec- tronically scanning commonly used laboratory tests for high-risk features, including tests obtained for other in- dications. A machine learning analysis in Israel evaluated multiple tests obtained during routine health checkups to identify those associated with future risk of CRC; the optimal combination used demographic and hematologic parameters from complete blood cell counts (CBCs), including red blood cell distribution width, hemoglobin, and mean corpuscular volume.15 The likely biologic mechanism is that even among those without overt anemia, low levels of blood loss from colorectal neo- plasms may cause subtle changes in such test profiles.16
The current study evaluated the algorithm’s predictive score performance in an extremely large, community-based cohort with substantial diversity by race/ethnicity, comprehensive electronic health records, and long-term cancer follow-up.

Methods

Study Population

We used a nested case-control design within a retrospective cohort of health plan members from Kaiser Permanente Northern California (KPNC). This integrated health care delivery organization serves approximately 4 million current members in urban, suburban, and semi- rural regions throughout Northern California. The membership is diverse and similar in socioeconomic characteristics to the region’s census demographics, including the proportions with commercial insurance, Medicare, and Medicaid.17 The study was approved by the KPNC institutional review board, which waived the requirement for individual informed consent.
What You Need to Know
Methods
A predictive score, generated by a machine learning algorithm with common laboratory test data, was previously developed to identify patients at risk for colorectal cancer. This algorithm requires validation in a large, ethnically diverse cohort.
Findings
The authors validated a predictive model that uses complete blood cell count and demographic data to identify patients at high risk of CRC. The algorithm identified 3% of the population who require an investigation and identified 35% of patients who received a diagnosis of CRC within the next 6 months.
Implications for patient care
This algorithm might be used to identify patients in the community who have an increased risk of CRC and should undergo immediate screening.

Study Cohort

The eligible study cohort population included 2,855,994 KPNC Health Plan members between 1996 and 2015 who had at least 1 outpatient CBC test performed at age 37 years or older for any indication to allow for an- alyses as young as age 40 with at least 3 years of prior CBC data. The primary analyses for the current study used a 40% random subset of the study population, which was restricted to patients at least 50 years of age.
Colorectal cancer cases and controls. For the primary analysis, cases were selected from eligible cohort members who received a CBC between 50 and 75 years of age, did not have a prior or current CRC diagnosis by the CBC date, and were subsequently diagnosed with CRC (defined by International Classification of Diseases-Oncology 3 codes C18.0, C18.2–18.9, C19.9, or C20.9 and histology/ morphology codes limited to adenocarcinoma). For patients with multiple eligible blood counts before their cancer diagnosis, one was chosen at random. Controls were in- dividuals between 50 and 75 years of age at the date of a randomly selected blood count test with no CRC diagnosis.

Data Definitions and Transfer

Demographics (age, gender, race/ethnicity), labora- tory data (CBC), pathology data (adenoma), and BMI were extracted from the electronic medical record. Cancer-related variables (diagnosis dates, location, and stage information) were provided from the Kaiser Per- manente Cancer registry, which reports to the California Surveillance Epidemiology and End Results cancer

Prediction Model Parameters

The prediction model performs algorithm-based an- alyses of medical information and produces a predictive score to identify patients who may be at increased risk for CRC. The score was intended to supplement, not replace, physician assessment and other diagnostic and clinical screening procedures. To calculate a score, the model requires, at minimum, gender, year of birth, and at least 1 CBC test, including cell parameters; additional data elements are included in the sensitivity and exploratory analyses. The output file contains a single record for each patient, consisting of either a score (on a scale that ranges from 0 to 100) or an error code indi- cating why a score could not be produced. A single cutoff score was identified on the basis of prior training and validation data.
The model, which was trained on a large cohort of individuals from Israel, generates a constant length vector of features, representing present and past values of each of the blood count parameters from the full history of CBCs per individual and time point. It then uses an ensemble of decision trees on the features vector (augmented by age and gender) to evaluate risk of CRC.15

Endpoints

The prediction model performs algorithm-based an- alyses of medical information and produces a predictive score to identify patients who may be at increased risk for CRC. The score was intended to supplement, not replace, physician assessment and other diagnostic and clinical screening procedures. To calculate a score, the model requires, at minimum, gender, year of birth, and at least 1 CBC test, including cell parameters; additional data elements are included in the sensitivity and exploratory analyses. The output file contains a single record for each patient, consisting of either a score (on a scale that ranges from 0 to 100) or an error code indi- cating why a score could not be produced. A single cutoff score was identified on the basis of prior training and validation data. The model, which was trained on a large cohort of individuals from Israel, generates a constant length vector of features, representing present and past values of each of the blood count parameters from the full history of CBCs per individual and time point. It then uses an ensemble of decision trees on the features vector (augmented by age and gender) to evaluate risk of CRC.15

Exploratory and Sensitivity Analyses

Colorectal adenoma cases. Colorectal adenoma cases. Cases were individuals who were diagnosed with adenomatous polyp/s on a screening colonoscopy, between 50 and 75 years of age, and who lacked a prior adenomatous polyp or any diagnosis of CRC. Controls were individuals, between 50 and 75 years of age, who had a screening colonoscopy without cancers or adenomas after the selected blood count test. Screening colonoscopies were used for both groups to decrease potential confounding by indication (eg, whereby colonoscopies were performed for in- dications such as anemia). For the adenoma analyses, we considered blood tests taken up to 182 days before a screening colonoscopy. Both cases and controls must have had at least 6-month continuous health plan membership and at least 1 CBC test before their screening colonoscopy. Gastrointestinal disorders. Several lower gastrointes- tinal disorders, specifically those with bleeding ten- dencies such as ulcer and angiodysplasia, evaluated in exploratory analyses were defined by International Classification of Diseases diagnostic codes (Supplementary Table 1). The endpoints considered in exploratory analyses included overall performance for detecting adenomatous polyp(s) and lower gastrointestinal disorders. Additional sensitivity analyses evaluated the performance of the algorithm to detect CRC by sex, race, BMI category, CRC location, and CRC stage. Finally, analyses were per- formed over a range of score thresholds and blood count test time windows (ie, CBC test performed 0–182 days and 183–365 days before cancer detection).

Statistical Methods

Colorectal adenoma cases. Colorectal adenoma cases. Cases were individuals who were diagnosed with adenomatous polyp/s on a screening colonoscopy, between 50 and 75 years of age, and who lacked a prior adenomatous polyp or any diagnosis of CRC. Controls were individuals, between 50 and 75 years of age, who had a screening colonoscopy without cancers or adenomas after the selected blood count test. Screening colonoscopies were used for both groups to decrease potential confounding by indication (eg, whereby colonoscopies were performed for in- dications such as anemia). For the adenoma analyses, we considered blood tests taken up to 182 days before a screening colonoscopy. Both cases and controls must have had at least 6-month continuous health plan membership and at least 1 CBC test before their screening colonoscopy. Gastrointestinal disorders. Several lower gastrointes- tinal disorders, specifically those with bleeding ten- dencies such as ulcer and angiodysplasia, evaluated in exploratory analyses were defined by International Classification of Diseases diagnostic codes (Supplementary Table 1).

Results

The study cohort (Table 1) consisted of 52% female subjects; 65.1% were white, 12.3% Asian, 7.3% black, and 15.3% with other or unknown race; 14% of the population was Hispanic. The study cohort’s de- mographics (n 1⁄4 308,721) by age and sex were similar to the underlying KPNC population at mid-study period (mean age, 59.6; 51.8% female) (Figure 1). The sensitivity of the predictive index for future CRC was 35.4% (95% CI, 33.8%–36.7%) for within 6 months after the CBC (Table 2). The AUC of 0.78 (95% CI, 0.77–0.78) was significantly higher than that of the AUC of 0.65 for a model using age alone and the AUCs of 0.61 and 0.60 for age alone models stratified by male and female, respectively (Figure 2).

Exploratory and Sensitivity Analyses

Subgroups (sex, race, body mass index). For the main model, the AUCs for male and female subjects were comparable, 0.76 and 0.78, respectively (Table 2).
Table 1. KPNC Cohort Characteristics, Limited to Age 50–75 at Complete Blood Counts Test

BMI, body mass index; CBC, complete blood cell count; CRC, colorectal cancer; KPNC, Kaiser Permanente Northern California.

Similarly, there was no significant difference in perfor- mance by race. When stratified by BMI, results were similar among individuals classified as normal, over- weight, or obese type I. However, the score had a higher OR and sensitivity for the group of obese type IIþIII (Table 2).

Colorectal cancer location. The model had a higher sensitivity for proximal cancer (51.8%; 95% CI, 49.4%– 53.9%) than for distal cancer (27.3%; 95% CI, 23.6%– 30.0%). The predictive model for above vs below the cutoff performed significantly better at identifying those at risk of proximal cancers (OR, 34.7; 95% CI, 31.5–37.7) than distal cancers (OR, 12.1; 95% CI, 10.1–13.9); P difference <.01; Table 2).

Colorectal cancer stage. The model had lower sensi- tivity for early stage CRC (Surveillance, Epidemiology, and End Results Program [SEER] stage 0, 1, and 2; sensitivity, 28.8%; OR, 13.1; 95% CI, 11.8–14.3) than for more advanced stages of CRC (SEER stage 3, 4, and 7; sensitivity, 43.4%; OR, 24.8; 95% sensitivity CI, 22.4–27.3), respectively; P difference <.01).

Predictive ability over time. Increasing time between CBC test and cancer diagnosis decreased sensitivity, whereas a significantly increased risk was maintained between high vs low results. The sensitivity for 0–182 days before CRC diagnosis was 35.4% (33.8%–36.7%), and for 183–365 days before CRC diagnosis it was 21.0% (19.1%–23.3%) at a specificity of 97% (Supplementary Figure 1). For high vs low scores, these corresponded to approximately 18-fold vs 9-fold increased risks, respec- tively (Table 3). For proximal CRC the OR was 34.7 (95% CI, 31.5–37.7) for CBC drawn 0–182 days before the CRC diagnosis compared with 14.2 (95% CI, 11.5–16.5) for 183–365 days (Table 3). For distal CRC, the OR was 12.1 (95% CI, 10.1–13.9) for CBC drawn 0–182 days before the CRC diagnosis compared with 5.4 (95% CI, 3.7–7.0) for 183–365 days (Supplementary Figure 2).

Predictive ability by score cutoff threshold. For prox- imal CRC at a (higher) 98% specificity level, the OR was 44.2 (95% CI, 40.5–48.7) for the time window of 0–182 days between CBC test and CRC diagnosis and 17.9 (95% CI, 14.6–21.1) for the time window of 183–365 days (Table 3). For proximal CRC at a 99% specificity level, an elevated score had an OR of 64.1 (95% CI, 58.1–71.2) for the time window of 0–182 days and 23.7 (95% CI, 19.0–28.9) for the time window of 183–365 days (Table 3). Similar trends for increasing sensitivity were seen for both distal and proximal CRC locations across the range of specificity levels presented (Table 3).

Adenomatous polyps and other outcomes. Among pa- tients receiving a screening colonoscopy, a CBC within 6 months predicted the presence of precancerous ade- nomas with AUC of 0.57 and sensitivity of 3.8% (Table 4). The model also identified persons with other lower gastrointestinal disorders, specifically those with bleeding tendencies such as ulcer and angiodysplasia (Table 4).

Discussion

This study validated the performance of a machine learning algorithm, previously derived in an Israeli population, for predicting subsequent detection of CRC and precancerous adenomas in a separate, demographi- cally diverse community population. The algorithm identified patients at 10 times increased risk of CRC by using CBC profiles, a level that would likely warrant referral to a gastroenterologist for further evaluation within the 50–75 age group. The model had significantly higher predictive ability than age alone and performed better for CBCs collected closer to CRC diagnosis (0–182 vs 183–365 days). The model was more sensitive for advanced stage CRC than for early stage, as would be expected. The model’s performance for detection of any adenomatous polyp was lower, although greater than random chance, in analyses restricted to patients receiving a screening colonoscopy. The model’s perfor- mance did not differ by gender, race, or BMI (for normal, overweight, and obese type I). This validation of the predictive model among a screening-eligible population supports prior studies with varying follow-up times in other populations.15,18-20 An initial validation within an Israeli population of 112,584 subjects with available CBCs from the Maccabi Health
Figure 1. Cohort flow dia- gram. CBC, complete blood cell count; KPNC, Kaiser Permanente North- ern California.

Services showed that individuals in the 99th percentile of scores were more than 20 times (OR, 21.8; 95% CI, 13.8–34.2) more likely to be diagnosed with CRC in the next 12–18 months.15 Within data from the Clinical Practice Data Link from the United Kingdom, scores associated with a specificity level of 99.5%, the OR for a CRC diagnosis was 26.5 (95% CI, 23.3–30.2) in a group of patients older than 40 years with 24 months of follow- up.20 In a US community-based population, the OR for detecting CRC was 7.1 at 97% specificity in patients aged 50–74 years, with CBC taken between 6 and 12 months before a CRC diagnosis,18 which was fairly similar to findings of the current study at the same specificity within the same time window and age group. Cumula- tively, these studies suggest that the predictive model can identify selected individuals at an increased risk of undetected CRC who can be targeted for more specific follow-up, such as colonoscopy. The replication across several populations, with slightly different follow-up intervals, including the current large, diverse community-based population, suggests the results should be generalizable to multiple settings.17

Table 2. Sensitivity Analysis by Gender, Race, BMI, Colorectal Cancer Location, and Stage for Persons Aged 50–75 With Time Window 0–182 Days and Specificity of 97%
BMI, body mass index; CRC, colorectal cancer; SEER, Surveillance, Epidemiology, and End Results Program.

Over the past decade, CRC incidence and mortality in the US have declined for both men and women, at least in part associated with increased screening.21 A stronger emphasis has been placed on screening by national or- ganizations and initiatives and funding authorized by legislation such as the Affordable Care Act, which pro- vided tens of millions of uninsured Americans access to health care at low cost.10,22

Figure 2. Receiver operating characteristics (ROC) curves for CBC obtained 0–182 days before colorectal cancer diag- nosis, ages 50–75. AUC, area under the curve; CBC, com- plete blood cell count.

These actions have reduced the burden of CRC in the US; however, disparities and inadequate screening rates are still common within many settings. It is estimated that differences in screening are responsible for 42% of the disparity between blacks and whites in CRC incidence and 19% of the difference in CRC mortality.23 Globally, the incidence of CRC is increasing; worldwide incidence and mortality are pro- jected to increase by approximately 60% by 2030.24,25 These increases in developing countries with inade- quate health care are likely due to limited access to early detection and substandard treatment when malignancy is identified.25 The use of inexpensive tests, which can be used on new or existing samples, may provide a low cost, readily available method to supplement existing screening efforts. The predictive model validated in this article is different from other screening modalities in that it requires no active participation from the patient. Instead, persons at higher risk of CRC can be identified by using existing CBC tests and basic patient data. The simplicity of the test may be acceptable to patients reluctant to undergo more intensive screening with co- lonoscopy; however, screening-eligible people should not routinely opt for this test in lieu of more proven tests with higher sensitivity, such as fecal immunochemical test and colonoscopy.26,27

Strengths of this study include the large study pop- ulation, which is representative of the demographics of Northern California, a diverse subsection of California.17 The population has high background rates of CRC screening (>80%) among screening-eligible persons. We used readily available electronic medical records to efficiently gather information for the cohort. In this setting, CBC data may be more readily available than in other community-based health care settings. Weaknesses of the study include the retrospective design, lack of comparable diagnostic data (eg, colonoscopy) on all the cancer controls, and inability to ascertain the specific reasons for blood testing.

Table 3. Model Performance at Various Specificity Levels for Distal and Proximal Colorectal Cancer at Different Time Windows For Persons Aged 50-75

In conclusion, this study validated the ability of a predictive model, derived from machine learning ap- proaches, to identify persons at increased risk of future CRC diagnoses in a large, diverse, community-based
population. This approach’s low sensitivity does not recommend it over effective primary screening methods such as fecal immunochemical test or colonoscopy; however, the model’s ability to detect CRCs before their clinical diagnosis suggests a potential for identifying more CRCs than would be recognized by regular screening alone. These results support further pro- spective evaluation to determine the method’s feasi- bility, efficiency, accuracy, and effectiveness in different clinical settings, as well as research to evaluate the influence of additional medical record data on test performance.

Table 4. Sensitivity Analysis for Adenomatous Polyp/s (Identified on Screening Colonoscopy) and Other Gastrointestinal Diagnoses for CBC Within 0–182 Days for Persons Aged 50–75 at Specificity Level of 97%

Adenoma diagnoses evaluated test performance for CBC within 182 days for both adenoma patients (cases with an adenoma on screening colonoscopy) and controls (normal screening colonoscopy). AUC, area under the curve; IBD, inflammatory bowel disease; OR, odds ratio.

Supplementary Material

Note: To access the supplementary material accom- panying this article, visit the online version of Clinical Gastroenterology and Hepatology at www.cghjournal.org, and at https://doi.org/10.1016/j.cgh.2020.04.054.

References

  1. Cancer Facts & Figures 2017. Atlanta, GA: American Cancer Society, 2017. Available from: https://www.cancer.org/research/ cancer-facts-statistics/all-cancer-facts-figures/cancer-facts- figures-2017.html. Accessed February 21, 2019.
  2. Colorectal Cancer Facts & Figures 2017-2019. Atlanta, GA: American Cancer Society, 2017. Available from: https://www. cancer.org/research/cancer-facts-statistics/colorectal-cancer- facts-figures.html.. Accessed February 21, 2019.
  3. Miller KD, Siegel RL, Lin CC, et al. Cancer treatment and sur- vivorship statistics, 2016. CA Cancer J Clin 2016;66:271–289.
  4. National Cancer Institute. Surveillance Epclass=”anc”idemiology, and End Results Program. Cancer stat facts: colorectal cancer [cited Feb 21, 2019]. Available from: https://seer.cancer.gov/statfacts/ html/colorect.html. Accessed February 21, 2019.
  5. Gross CP, Andersen MS, Krumholz HM, et al. Relation between Medicare screening reimbursement and stage at diagnosis for older patients with colon cancer. JAMA 2006;296:2815–2822.
  6. Corley DA, Jensen CD, Marks AR, et al. Adenoma detection rate and risk of colorectal cancer and death. N Engl J Med 2014; 370:1298–1306.
  7. Doubeni CA, Corley DA, Quinn VP, et al. Effectiveness of screening colonoscopy in reducing the risk of death from right and left colon cancer: a large community-based study. Gut 2018;67:291–298.
  8. Levin TR, Corley DA, Jensen CD, et al. Effects of organized colorectal cancer screening on cancer incclass=”anc”idence and mortality in a large community-based population. Gastroenterology 2018; 155:1383–1391 e5.
  9. Doubeni CA, Fedewa SA, Levin TR, et al. Modifiable failures in the colorectal cancer screening process and their association with risk of death. Gastroenterology 2019;156:63–74 e6.
  10. Force USPST. Screening for colorectal cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med 2008;149:627–637.
  11. Lauby-Secretan B, Vilahur N, Bianchini F, et al. The IARC perspective on colorectal cancer screening. N Engl J Med 2018; 378:1734–1740.
  12. Hewitson P, Glasziou P, Irwig L, et al. Screening for colorectal cancer using the faecal occult blood test, Hemoccult. Cochrane Database Syst Rev 2007;(1)CD001216.
  13. Siegel RL, Miller KD, Fedewa SA, et al. Colorectal cancer sta- tistics, 2017. CA Cancer J Clin 2017;67:177–193.
  14. Driver JA, Gaziano JM, Gelber RP, et al. Development of a risk score for colorectal cancer in men. Am J Med 2007; 120:257–263.
  15. Kinar Y, Akiva P, Choman E, et al. Performance analysis of a machine learning flagging system used to class=”anc”identify a group of indivclass=”anc”iduals at a high risk for colorectal cancer. PLoS One 2017; 12:e0171759.
  16. Spell DW, Jones DV Jr, Harper WF, et al. The value of a complete blood count in predicting cancer of the colon. Cancer Detect Prev 2004;28:37–42.
  17. Gordon N. How does the adult Kaiser Permanente membership in Northern California compare with the larger community?, 2006. Available from: http://www.dor.kaiser.org/dor/mhsnet/public/kpnc_community.htm. Accessed February 21, 2019.
  18. Hornbrook MC, Goshen R, Choman E, et al. Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data. Dig Dis Sci 2017; 62:2719–2727.
  19. Kinar Y, Kalkstein N, Akiva P, et al. Development and valclass=”anc”idation of a predictive model for detection of colorectal cancer in pri- mary care by analysis of complete blood counts: a binational retrospective study. J Am Med Inform Assoc 2016;23:879–890.
  20. Birks J, Bankhead C, Holt TA, et al. Evaluation of a prediction model for colorectal cancer: retrospective analysis of 2.5 million patient records. Cancer Med 2017;6:2453–2460.
  21. Ansa BE, Coughlin SS, Alema-Mensah E, et al. Evaluation of colorectal cancer incclass=”anc”idence trends in the United States (2000- 2014). J Clin Med 2018;7:22.
  22. Sommers BD, Musco T, Finegold K, et al. Health reform and changes in health insurance coverage in 2014. N Engl J Med 2014;371:867–874.
  23. Lansdorp-Vogelaar I, Kuntz KM, Knudsen AB, et al. Contribution of screening and survival differences to racial disparities in colorectal cancer rates. Cancer Epclass=”anc”idemiol Biomarkers Prev 2012;21:728–736.
  24. Boakye D, Rillmann B, Walter V, et al. Impact of comorbclass=”anc”idity and frailty on prognosis in colorectal cancer patients: a systematic review and meta-analysis. Cancer Treat Rev 2018;64:30–39.
  25. Arnold M, Sierra MS, Laversanne M, et al. Global patterns and trends in colorectal cancer incclass=”anc”idence and mortality. Gut 2017; 66:683–691.
  26. Lee JK, Jensen CD, Levin TR, et al. Long-term risk of colorectal cancer and related death after adenoma removal in a large, community-based population. Gastroenterology 2019;158.
  27. Lee JK, Jensen CD, Levin TR, et al. Long-term risk of colorectal cancer and related deaths after a colonoscopy with normal findings. JAMA Intern Med 2019;179:153–160.

Reprint requests

Address requests for reprints to: Jennifer L. Schneider, MPH, 2000 Broadway, Oakland, California 94612. e-mail: Jennifer.L1.Schneider@kp.org, fax: (510) 891-3802.

CRediT authorship contributions

Jennifer Schneider – Study Concept and Design, Drafting and Editing of the Manuscript
Evan Layefsky – Drafting of the Manuscript
Natalia Udaltsova – Analysis and Interpretation of Data
Theodore Levin – Study Concept and Design, Editing of the Manuscript Douglas Corley – Study Design and Concept, Editing of the Manuscript

Conflicts of interest

The authors disclose no conflicts.

Funding

Supported by a contract to the Kaiser Foundation Research Institute, Oakland, CA, from Medial Early Sign Inc, Kfar Malal, Israel. The analyses were funded by Medial Research, which developed the diagnostic algorithm being evaluated.
Supplementary Figure 1. Receiver operating characteristics curves for complete blood cell counts obtained 0–365 days before colorectal cancer diagnosis and stratified by time windows 0–182 days and 183–365 days before colorectal cancer diagnosis, ages 50–75. AUC, area under the curve.
Supplementary Figure 2. Receiver operating characteristics curves for complete blood cell counts obtained 0–365 days before colorectal cancer diagnosis and stratified by time windows 0–182 days and 183–365 days before distal and proximal colorectal cancer diagnosis, ages 50–75. AUC, area under the curve.
Supplementary Table 1. ICD-9 Diagnosis Codes Used to Define Each of the Gastrointestinal Disorders

Supplementary Table 1. Continued

Supplementary Table 1. Continued

Supplementary Table 1. Continued

Supplementary Table 1. Continued