การเปรียบเทียบเทคนิคการเพิ่มประสิทธิภาพของการประมวลผลแมพรีดิวซ์สำหรับข้อมูลที่มีความเบ้

นักปราชญ์ กันตีวงศ์

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/79399

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	พฤษภ์ บุญมา	-
dc.contributor.author	นักปราชญ์ กันตีวงศ์	en_US
dc.date.accessioned	2024-01-11T10:32:13Z	-
dc.date.available	2024-01-11T10:32:13Z	-
dc.date.issued	2566-11-11	-
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/79399	-
dc.description.abstract	Nowadays, data is large in terms of volume, variety of format and changes rapidly. Therefore, it is necessary to rely on processing to make data accuracy and precision. To ensure real-time utilization and maximum benefit. MapReduce is a framework for processing big data, including 2 functions: Map function and Reduce function. Map function processes input data set as key/value pairs and generates intermediate key/value pairs with the same key as output. Reduce function merges all intermediate values in each set. In this process, the reducer must wait for all maps to finish before the reducer starts. Due to non-uniform distribution of big data. MapReduce process will be delayed because data that split to all nodes is not same size in map stages. As a result, each node complete processing at different times. Nodes that finish earlier must wait for the final mapper node to complete before they can proceed with reducing. The results show that all of the algorithms studied in this paper can improve the execution time of MapReduce with skewed data. However, there are some limitations to improvement, especially when data is not heavily skewed; the overhead of the algorithms might overcome their benefits.	en_US
dc.language.iso	other	en_US
dc.publisher	เชียงใหม่ : บัณฑิตวิทยาลัย มหาวิทยาลัยเชียงใหม่	en_US
dc.title	การเปรียบเทียบเทคนิคการเพิ่มประสิทธิภาพของการประมวลผลแมพรีดิวซ์สำหรับข้อมูลที่มีความเบ้	en_US
dc.title.alternative	Comparision of Efficiency Improvement Techniques for MapReduce with Skewed Data	en_US
dc.type	Thesis
thailis.controlvocab.thash	ฐานข้อมูลแบบกระจาย	-
thailis.controlvocab.thash	ข้อมูลขนาดใหญ่	-
thesis.degree	master	en_US
thesis.description.thaiAbstract	ปัจจุบันข้อมูลมีขนาดใหญ่ และมีรูปแบบที่หลากหลาย รวมไปถึงการเปลี่ยนแปลงอย่างรวดเร็ว เพื่อที่จะนำข้อมูลดังกล่าวไปใช้ให้เกิดประโยชน์สูงสุดและทันเวลานั้น จึงจำเป็นต้องอาศัยการประมวลผลเพื่อให้ข้อมูลมีความถูกต้องและมีความแม่นยำ แมพรีดิวซ์เป็นกรอบวิธีการทำงานสำหรับการประมวลผลข้อมูลขนาดใหญ่ ประกอบไปด้วย 2 ฟังก์ชัน ได้แก่ แมพ และ รีดิวซ์ ซึ่งแมพคือการจับคู่คีย์กับค่ากลาง ส่วนรีดิวซ์คือ ผลรวมจากค่ากลางของแต่ละคีย์ โดยในการทำงานนั้นจะต้องรอให้กระบวนการแมพแล้วเสร็จจึงจะดำเนินการรีดิวซ์ได้ เนื่องจากข้อมูลขนาดใหญ่ที่มีการแจกแจงแบบไม่ปกตินั้น จะส่งผลให้การประมวลผลโดยแมพรีดิวซ์นั้นเกิดความล่าช้า เนื่องจากข้อมูลที่จะนำไปสู่กกระบวนการแมพจะถูกแบ่งให้แต่ละโหนดโดยมีขนาดที่ไม่เท่ากัน ส่งผลให้การแต่ละโหนดประมวลผลเสร็จไม่พร้อมกันท โดยโหนดที่เสร็จก่อนจะต้องรอให้โหนดสุดท้ายที่ทำการแมพเสร็จจึงจะสามารถทำการรีดิวซ์ได้ งานวิจัยนี้ได้รวบรวมแนวทางในการแก้ไขปัญหาดังกล่าว ที่สามารถเกิดขึ้นในขั้นตอนของการแมพหรือการรีดิวซ์ ซึ่งผลจากการทดลองพบว่า แนวทางดังกล่าวที่ได้เสนอมานั้นสามารถทำงานได้ดีกว่าการแมพรีดิวซ์แบบธรรมดาสำหรับข้อมูลที่มีความเบ้หนัก อย่างไรก็ตามข้อมูลที่มีความเบ้น้อยพบว่าบางแนวทางอาจใช้เวลามากกว่าแมพรีดิวซ์แบบธรรมดา	en_US
Appears in Collections:	ENG: Theses

Files in This Item:

File	Description	Size	Format
620631056 nakprad kanteewong.pdf		9.51 MB	Adobe PDF	View/Open Request a copy

Show simple item record