Computer-assisted Flagging of Never Smokers at High Risk of NSCLC in a Large US-based HMO using the LungFlag Model

Share this article:

Share your details and we'll email you our Publication


Eran N. Choman

Alon Lanyado


Computer-assisted Flagging of Never Smokers at High Risk of NSCLC in a Large US-based HMO using the LungFlag Model

Background & Aims

Smoking is considered to be the major cause of lung cancer, but lung cancer is not just a smokers’ disease. The prevalence of lung cancer in never-smokers is gradually rising with around 20% of lung cancers in the UK and US occurring in people who have never smoked[1][2]. This figure rises to around 50% in some Asian countries[3][4].


A machine-learning algorithm based on routine EHR and laboratory data, previously developed and validated on an ever-smoker population[5] was used to evaluate the accuracy in detection of Non-Small Cell Lung Cancer (NSCLC) among individuals who never smoked. The cohort was from a large US health system and included 509 case patients with NSCLC and 50,001 contemporaneous NSCLC-free controls. We compared the performance of two risk prediction models, LungFlag and the PLCOm2012 model adapted to EHR data (mPLCOm2012).


Data were analyzed using the area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and diagnostic odds ratio (OR) as measures of model performance for the age group 40 and above. The risk predictors were calculated for multiple time windows prior to the diagnosis date (Dx) using cut-offs yielding specificities of 90%, 95%, 97% or 99%. (Details in PDF)


By using available information existing in the EHR, the model demonstrated high accuracy (OR>6) in early detection of NSCLC among never-smokers with data going back up to 24 months before diagnosis. Furthermore, LungFlag creates an opportunity to carry out case finding in a population with growing rates of lung cancer that is currently not offered any screening, yet additional local validations are recommended.