Please use this identifier to cite or link to this item:
Title: A Real-time bus arrival time predictive system based on spark framework and machine learning approaches: A case study in Chiang Mai
Other Titles: ระบบพยากรณ์การมาถึงของรถประจำทางโดยใช้การคำนวณแบบจัดกลุ่ม ด้วยสปาร์คเฟรมเวิร์คและการเรียนรู้ของเครื่อง: กรณีศึกษาในจังหวัดเชียงใหม่
Authors: Ye Li
Authors: Pree Thiengburanathum
Ye Li
Issue Date: May-2021
Publisher: เชียงใหม่ : บัณฑิตวิทยาลัย มหาวิทยาลัยเชียงใหม่
Abstract: As the quality of living standards has been improving, more people have private vehicles, which leads to traffic congestion, environmental pollution, and even traffic accidents. Due to Chiang Mai's public transportation master plan, the city's major public transportation are red taxis, grab, tuk-tuk and bus. Especially, most people choose red taxis to travel around the city. The bus system has been deployed as a new option for public transport in the city. However, the bus passengers in Chiang Mai are hesitant to take the bus because people do not have confidence in the bus schedule. There is a need for an intelligent system, such as a bus arrival time (BAT) prediction system, to help bus passengers accurately obtain BAT. Predicting the BAT at a certain bus station is challenging due to concerns with real- time data processing, numerous data inputs, and predictive accuracy. For instance, the bus station's arrival time at a particular bus station is affected by the previous bus station or the previous few bus stations. Therefore, one of the challenges is to improve BAT prediction capabilities while considering the impact of previous station data. There are previous studies that used a small number of features, such as bus location, to make predictions because of the limited-features collection. Nevertheless, the forecast of BAT is affected by many features involving bus travel time, bus travel speed, and bus travel distance. Therefore, using a small number of features to construct other related features of BAT prediction is a challenge. Another challenge is that previous studies only analyzed a small amount of data to save costs and increase the speed of forecasting, resulting in poor predictive power. Moreover, most previous studies used Pandas library to process their data. They only focus on modeling and ignore the real-time nature of data processing. This research proposed a real-time BAT prediction system. We collected real-world 78 days data of the Chiang Mai R3-Y bus route. The data consist of real-time bus location and location timestamp. Moreover, we collected the location of bus station data from Google Maps. We used these data to do feature engineering, which includes proposing algorithms to extract features involving BAT, departure time, travel time, travel speed, travel distance, and dwell time. This research's operational flow follows the cross- industry standard process for data mining (CRISP-DM), begins with business understanding, data understanding, data preparation (using Spark framework), modeling (using Autoregressive Integrated Moving Average with Explanatory Variable (ARIMAX) and Support Vector Regression (SVR) algorithm), and evaluation (using mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), and coefficient of determination (R3). We evaluated the real-time performance of bus data processing based on Spark. The experiment results reveal that when the file size is about 900 KB, Spark's running time is about 3 seconds. The running time of Pandas is about 3-folds that of Spark. When the file size increased by about 3-folds, Spark's running time is about 7 seconds. The running time of Pandas is about 60-folds that of Spark. The SVR model's accuracy has achieved 99.5%, which is 25% higher than the ARIMAX model. This research proves that the Spark framework combined with the SVR model to predict time series data can reduce the data processing time and achieve high predictive power.
Appears in Collections:CAMT: Theses

Files in This Item:
File Description SizeFormat 
612131005 YE LI.pdf3.36 MBAdobe PDFView/Open    Request a copy

Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.