Accurate mode-choice forecasts are vital for effective transportation planning. Transit agencies and city planners rely on precise predictions, but unreliable forecasts can misdirect even the most behaviorally grounded insights. For decades, discrete choice models (DCMs), notably Multinomial Logit (MNL) and Mixed Multinomial Logit (MMNL), have explained why travelers choose particular modes via interpretable parameters, yet they often underperform in forecast accuracy. More recently, machine learning methods (e.g., tree-based algorithms) have come to capture complex, nonlinear patterns, often outperforming DCMs in point-prediction accuracy. However, they lack built-in confidence measures, limiting their use in risk-aware decision making. In this work, we help narrow this gap by wrapping our best ML model in an Inductive Mondrian Conformal Prediction (IMCP) layer with per-mode calibration at 90% nominal coverage. We leverage a survey of approximately 8,000 Italian employees, capturing their socio-economic attributes and travel habits. Using a tailored preprocessing pipeline, we compare XGBoost, Random Forest, and CatBoost, observing that XGBoost performs best on the test set with an overall accuracy of 89.7% and a macro-average F1 score of 83.6%. Our IMCP layer then produces distribution-free prediction sets that contain the true mode at least 90% of the time, both overall and within each individual mode category. Singleton prediction sets can be treated as high-confidence forecast for capacity planning, while multilabel sets (and the occasional empty sets for highly ambiguous cases) highlight where uncertainty is greatest and pinpoint exactly which individuals merit follow-up surveys or targeted incentives.
Enhancing Mode-Choice Models with Conformal Prediction: Uncertainty Quantification and Decision Support using Tree-Based Machine Learning / Bohlouli, Ramin; Varghese, Ken Koshy; Gentile, Guido; Eldafrawi, Mohamed. - In: TRANSPORT AND TELECOMMUNICATION. - ISSN 1407-6179. - (2025).
Enhancing Mode-Choice Models with Conformal Prediction: Uncertainty Quantification and Decision Support using Tree-Based Machine Learning
Ramin Bohlouli
Primo
;ken koshy varghese
;Guido Gentile;Mohamed Eldafrawi
2025
Abstract
Accurate mode-choice forecasts are vital for effective transportation planning. Transit agencies and city planners rely on precise predictions, but unreliable forecasts can misdirect even the most behaviorally grounded insights. For decades, discrete choice models (DCMs), notably Multinomial Logit (MNL) and Mixed Multinomial Logit (MMNL), have explained why travelers choose particular modes via interpretable parameters, yet they often underperform in forecast accuracy. More recently, machine learning methods (e.g., tree-based algorithms) have come to capture complex, nonlinear patterns, often outperforming DCMs in point-prediction accuracy. However, they lack built-in confidence measures, limiting their use in risk-aware decision making. In this work, we help narrow this gap by wrapping our best ML model in an Inductive Mondrian Conformal Prediction (IMCP) layer with per-mode calibration at 90% nominal coverage. We leverage a survey of approximately 8,000 Italian employees, capturing their socio-economic attributes and travel habits. Using a tailored preprocessing pipeline, we compare XGBoost, Random Forest, and CatBoost, observing that XGBoost performs best on the test set with an overall accuracy of 89.7% and a macro-average F1 score of 83.6%. Our IMCP layer then produces distribution-free prediction sets that contain the true mode at least 90% of the time, both overall and within each individual mode category. Singleton prediction sets can be treated as high-confidence forecast for capacity planning, while multilabel sets (and the occasional empty sets for highly ambiguous cases) highlight where uncertainty is greatest and pinpoint exactly which individuals merit follow-up surveys or targeted incentives.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


