Please use this identifier to cite or link to this item:
http://cmuir.cmu.ac.th/jspui/handle/6653943832/79202
Title: | Development of batch data pipeline system for flight delay prediction |
Other Titles: | การพัฒนาระบบการย้ายข้อมูลแบบชุดสำหรับการพยากรณ์เวลาการล่าช้าของเที่ยวบิน |
Authors: | Suchada Manowon |
Authors: | Pruet Boonma Suchada Manowon |
Issue Date: | Oct-2023 |
Publisher: | Chiang Mai : Graduate School, Chiang Mai University |
Abstract: | Flight delays persist as a challenge, which impacting airline and airport productivity, passenger experience, and financial resources. Nowadays, air transportation data predominantly rely on administrative records from various institutions. This study aims to designing and implementing an effective data pipeline system with the capacity to capture high-frequency data from diverse sources through batch processing. This comprehensive pipeline encompasses the entire of end-to-end data pipeline stages; including data sourcing, ingestion, processing, storage, and analysis. The proposed pipeline system extracts data from various datasets, including flight data, airport information, airline details, airplane specifications, and routes. It employs a variety of methods such as web scraping, APIs, and database loading for data ingestion. It efficiently consolidates flight information, transforming and cleaning data and then loading it into a designated destination database. Additionally, this study establishes an automated batch processing platform using Apache Airflow. This platform is characterized by a comprehensive evaluation across three essential aspects; 1. System metrics, including memory and disk usage, 2. Job metrics extracted from Airflow metrics, which are utilized to monitor processes, ensuring smooth execution, 3. Data quality metrics that assess six dimensions – accuracy, validation, completeness, consistency, uniqueness, and timeliness – to ensure the usability of the defined data. Leveraging the flight dataset for data analysis and data visualization, this approach involves the comparison of various base regression models for flight delay prediction. Additionally, flight data dashboards offer data insights. The implications of this multifaceted approach extend to enhancing air transportation statistics, predictive modeling capabilities, and facilitating data-driven decision-making processes. |
URI: | http://cmuir.cmu.ac.th/jspui/handle/6653943832/79202 |
Appears in Collections: | ENG: Independent Study (IS) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
640632023-SUCHADA MANOWON.pdf | 4.12 MB | Adobe PDF | View/Open Request a copy |
Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.