Development of deep reinforcement learning method for production scheduling in a Two-stage flow production system with parallel machines and sequence-dependent setup times

Gerpott, Falk Torsten

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/79358

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Poti Chaopaisarn	-
dc.contributor.advisor	Zadek, Hartmut	-
dc.contributor.author	Gerpott, Falk Torsten	en_US
dc.date.accessioned	2024-01-02T17:00:13Z	-
dc.date.available	2024-01-02T17:00:13Z	-
dc.date.issued	2021-11-10	-
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/79358	-
dc.description.abstract	Production scheduling covers the allocation and sequencing of jobs in a production system. Traditional solution methods for this optimization problem often face a tradeoff between an acceptable solution quality and a short computation time. In addition, current trends in production and logistics amplify the need for real-time solutions that are also capable of adapting to changes in demand, products or the production system. Reinforcement learning as one machine learning method offers a promising alternative approach to better cope with well-known tradeoff and new trend challenges. In a reinforcement learning framework a production system is construed as an environment in which an agent makes decisions regarding job allocation and sequencing. This agent can be represented by a (deep) neural network, which learns through a reward what decisions lead to good results in terms of certain goal criteria. Different algorithms can be used for learning purposes to achieve stable and efficient training progress. The present applies the Advantage Actor Critic (A2C) algorithm is applied, which combines two learning approaches and parallelizes its learning process. The A2C is investigated for the first time for a scheduling problem with sequence- dependent setup times in a hybrid flow shop system. A pre-implemented A2C algorithm of the Python library Stable Baselines3 is used. As specific application, a real production system is modeled in a salabim simulation model, with the agent taking decisions via the OpenAI Gym interface. The results indicate that deep reinforcement learning keeps up with or even outperforms previous solution approaches in terms of solution quality, as well as computational efficiency.	en_US
dc.language.iso	en	en_US
dc.publisher	Chiang Mai : Graduate School, Chiang Mai University	en_US
dc.title	Development of deep reinforcement learning method for production scheduling in a Two-stage flow production system with parallel machines and sequence-dependent setup times	en_US
dc.title.alternative	การพัฒนาวิธีการเรียนรู้ของการเสริมแรงเชิงลึกเพื่อการจัดตารางการผลิตในระบบของการผลิตแบบสองขั้นตอนด้วยเครื่องจักรแบบขนานและเวลาตั้งค่าขึ้นกับลำดับงาน	en_US
dc.type	Thesis
thailis.controlvocab.lcsh	Production planning	-
thailis.controlvocab.lcsh	Reinforcement learning	-
thailis.controlvocab.lcsh	Machine learning	-
thesis.degree	master	en_US
thesis.description.thaiAbstract	ในแผนการผลิตนั้นมักจะครอบคลุมทั้งการจัดสรร และการจัดลำดับงานในระบบการผลิต ซึ่งการใช้ วิธีการแก้ปัญหาดั้งเดิมของปัญหาการหาคำให้เหมาะสมนี้ก็มักจะต้องตัดสินใจเลือกระหว่างวิธีที่มี คุณภาพ กับเวลาที่ใช้คำนวณที่สั้นที่สุดที่สามารถยอมรับได้ ซึ่งนอกจากนี้แล้วแนวโน้มของการผลิต และขนส่งในปัจจุบันก็มีความต้องการวิธีการแบบเรียลไทม์ (Real time) ที่สามารถปรับให้เข้ากับการ เปลี่ยนแปลงตามความต้องการของผลิตภัณฑ์ และระบบการผลิต โดยการเรียนรู้แบบเสริมแรง (Reinforcement) เป็นอีกวิธีการเรียนรู้ของเครื่อง (Machine learning) ที่นำเสนอทางเลือกในการรับมือ ที่ดีกว่าจากการเผชิญหน้าของปัญหาเดิมซ้ำๆ และจากแนวโน้มของปัญหาใหม่ ซึ่งในกรอบการเรียนรู้ แบบเสริมแรงนั้นระบบการผลิตจะ ถูกดีความว่าเป็นสภาพแวดล้อมที่มีตัวแทนช่วยตัดสินใจในการ จัดสรร แล จัดลำดับงาน โดยที่ตัวแทนนี้จะ ถูกแสดงนำเสนอตัวยระบบ โครงข่ายประสาท (เชิงลึก) ที่ เรียนรู้ผ่านการ ให้รางวัลสำหรับตัดสินใจเพื่อนำไปสู่ผลลัพธ์ที่คีของกฎเกณฑ์เป้าหมายบางอย่าง และ อัลกอริธึมที่แตกต่างกันก็มักถูกใช้เพื่อวัตถุประ สงค์ในการเรียนรู้ให้เกิดความก้าวหน้าในการฝึกที่ เสถียร และ มีประ สิทธิภาพ ซึ่งในปัจจุบันในการใช้อัลกอริธึมแบบ Advantage Actor Critic (A2C) ที่ เป็นการรวมเอาวิธีการเรียนรู้ของ 2 กระบวนการเข้าด้วยกันนั้นทำให้เกิดกระบวนการเรียนรู้แบบ คู่ขนาน ซึ่งอัลกอริธึม A2C ได้ถูกการทดสอบครั้งแรกกับปัญหาการจัดแผนงานที่ขึ้นกับเวลา และ ลำดับของระบบการไหลของร้านค้าแบบไฮบริด จากการดึงอัลกอริธึม A2C ที่เสถียรแล้วของไลบรารี่ ในไพธอนบรรทัดที่ 3 ได้ถูกนำมาใช้ในการประยุกต์ใช้งานที่เฉพาะเจาะจง โดยที่ระบบการผลิดจริง จะ ถูกจำลองในแบบจำลองโปรแกรม Salabim โคยที่ตัวแทนจะทำการตัดสินใจผ่านอินเทอร์เฟซของ OpenAI Gym ผลกรทคลองแสดงให้เห็นว่าการเรียนรู้แบบเสริมแรงเชิงลึกนั้นสามารถติดตามผล หรือทำได้ดีกว่าแนวทางการแก้ปัญหาแบบดั้งเดิมทั้งในแง่ของวิธีการที่มีคุณภาพพร้อมกับการ คำนวณที่มีประสิทธิภาพ	en_US
Appears in Collections:	ENG: Theses

Files in This Item:

File	Description	Size	Format
640631140 FALK TORSTEN GERPOTT.pdf		28.81 MB	Adobe PDF	View/Open Request a copy

Show simple item record