Identification of Secondary Crash Risk Factors using Penalized Logistic Regression Model

Document Type


Publication Date



Secondary crashes (SCs) have increasingly been recognized as a major problem leading to reduced capacity and additional traffic delays. However, the limited knowledge on the nature and characteristics of SCs has largely impeded their mitigation strategies. There are two main issues with analyzing SCs. First, relevant variables are unknown, but, at the same time, most of the variables considered in the models are highly correlated. Second, only a small proportion of incidents results in SCs, making it an imbalanced classification problem. This study developed a reliable SC risk prediction model using the Least Absolute Shrinkage and Selection Operator (LASSO) penalized logistic regression model with Synthetic Minority Oversampling TEchnique-Nominal Continuous (SMOTE-NC). The proposed model is considered to improve the predictive accuracy of the SC risk model because it accounts for the asymmetric nature of SCs, performs variable selection, and removes highly correlated variables. The study data were collected on a 35-mi I-95 section for 3 years in Jacksonville, Florida. SCs were identified based on real-time speed data. The results indicated that real-time traffic variables and primary incident characteristics significantly affect the likelihood of SCs. The most influential variables included mean of detector occupancy, coefficient of variation of equivalent hourly volume, mean of speed, primary incident type, percentage of lanes closed, incident occurrence time, shoulder blocked, number of responding agencies, incident impact duration, incident clearance duration, and roadway alignment. The study results can be used by agencies to develop SC mitigation strategies, and therefore improve the operational and safety performance of freeways.

Publication Title

Transportation Research Record

Digital Object Identifier (DOI)