Application of item response theory to improve the score received from the nine-questions depression-rating scale in the Northern Thai dialect

Suttipong Kawilapat

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/80003

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patrinee Traisathit	-
dc.contributor.advisor	Sukon Prasitwattanaseree	-
dc.contributor.advisor	Benchalak Maneeton	-
dc.contributor.advisor	Narong Maneeton	-
dc.contributor.author	Suttipong Kawilapat	en_US
dc.date.accessioned	2024-08-28T01:25:19Z	-
dc.date.available	2024-08-28T01:25:19Z	-
dc.date.issued	2024-06-25	-
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/80003	-
dc.description.abstract	Depressive disorders, among the most common mental disorders, are a leading cause of the global disease burden, self-harm, and deaths by suicide. The measurement tools were established to screen or diagnose depressive disorder such as the Patient Health Questionnaire (PHQ-9), Hamilton Rating Scale for Depression (HRSD-17), etc. Since the assessment using the PHQ-9 was accounted only for the frequencies of nine depressive symptoms in previous two weeks, the Nine-Questions Depression-Rating Scale (9Q) tool was developed which accounts for both frequencies and severities of each symptom. The 9Q tool was first established among the participants in the central region of Thailand and then developed to the northern Thai dialect version to establish in the northern region. Although the 9Q is appropriate to use in assessment for the depressive symptoms severity, the standard sum scoring which equally summed from all items based on the Classical test theory (CTT) might not well represent the true trait level of depressive symptoms and might lead to the bias in assessment. Therefore, we would like to develop a new scoring method based on the Item response theory (IRT) which takes the different weights from each item and each category into account. The optimal cut-off points of the depressive symptom severity for the proposed scoring method based on the IRT approach were also considered. It has been hypothesized that applying IRT parameters as the weighted parameters for weighted sum scoring could be beneficial for mitigating this issue than the CTT approach. Secondary data from a study on the criterion-related validity of a revised 9Q in the northern Thai dialect comprising 1,527 individuals among the participants aged 13 or more in the northern region of Thailand were used in the study. All participants were first interviewed to obtain their demographic data and screened for depression using the 9Q, assessed for depression using HRSD-17 as the gold standard, and then diagnosed by physicians for depressive symptoms. A four-phase analysis comprising the following steps was conducted: Phase 1: This was conducted to explore any differential item functioning (DIF) for each item of the 9Q related to the characteristics, e.g., gender, age, chronic diseases, and income. The determination of appropriate IRT model were conducted among 1,475 participants who endorsed all items of the 9Q. The generalized partial credit model (GPCM) was the most appropriate model for this phase. we considered using both IRT-based and ordinal logistic regression (OLR)-based approaches to detect the DIF between groups. We also compared and reviewed the findings retrieved from both approaches. The results showed that the DIF detection from the IRT- and OLR-based approaches were different. It might be due to the discrimination and threshold parameters which were accounting in the IRT-based approach. Scoring of the discrimination and threshold parameters across characteristics based on the IRT approach might be useful for reducing bias in depression measurement. Phase 2: This phase investigated the factor structure of the 9Q tool to find the best fitted model for assess the depressive symptoms between one- and two-factor model which provided in the previous studies using confirmatory factor analysis (CFA) and the model proposed by the exploratory factor analysis (EFA) among 1,346 participants aged 19 years or more who attended in this study and did not experience with any psychiatric disorders. The relationship between characteristics of the participants (i.e., gender, age, marital status, income, educational level, occupation, and chronic diseases) to the study model were explored using the Multiple Indicators Multiple Causes (MIMIC) model. According to the EFA, a two-factor model separated into a cognitive-affective dimension (6 items) and a somatic dimension (3 items) which correlated with each other (r = 0.771) (RMSEA = 0.077, CFI = 0.953, TLI = 0.936). The CFA results indicate that an EFA model with two factors provided better fit index values than the previously published two-factor models. According to a MIMIC model, dyslipidemia was positively associated with both cognitive-affective symptoms (β = 0.120) and somatic depressive symptoms (β = 0.080). Allergies were associated with a higher level of cognitive-affective depressive symptoms (β = 0.087), while migraine (β = 0.114) and peptic ulcer disease (β = 0.062) were associated with a higher level of somatic symptoms. Increased age was associated with a lower level of somatic symptoms (β = -0.088). The findings illustrate that considering depressive symptoms as two dimensions yields a better fit for depressive symptoms and that using a multidimensional IRT model is beneficial for more precise scoring using the 9Q. Phase 3: An IRT-based weighted sum scoring approach was developed to provide a new scoring method for the 9Q. The assumptions of IRT model were separately considered for each age group and found that the participants aged lower than 19 years were not fitted for using unidimensional IRT model and excluded from this phase. Of the 1,355 participants included in the study, 1,000 and 355 participants were randomly selected for the developmental and validation groups for IRT-based weighted scoring, respectively. The IRT models considered for this phase were only the one-factor model regarding the unidimensionality assumption. According to the model selection, the graded response model (GRM), which accounts for the discrimination parameters for each item and threshold parameters for order categories in each item, were used in this phase. The nominal response model (NRM) which independently estimated the category parameters was also considered in this phase. The scoring model considered in this study included the GRM model (9Q-GRM), the GRM model accounting for DIF (9Q-GRM-DIF), the NRM model (9Q-NRM), and the NRM model accounting for DIF (9Q-NRM-DIF). The results showed that the 9Q-GRM model accounting for DIF had a higher precision (16.7%) than the traditional sum-score approach. The findings suggest that weighted sum scoring with IRT parameters accounting for DIF between genders improves the precision of scoring. Phase 4: The aim of this phase was identifying optimal cut-off points for severity of depressive symptoms based on the IRT-based weighted scoring. The participants were separated into 2 groups, developmental (n = 1,000) and validation (n = 355) groups, like Phase 3. The Liu’s or Yuden’s methods which considered the sensitivity and specificity from the receiver operating characteristic (ROC) were applied in this phase compared with the IRT-based method which related to the theta parameter accounting for the prevalence of each severity. Three dummy variables of major depressive disorder (MDD) severity were defined and used in the determination including severe MDD versus the rest, moderate-and-severe MDD versus the rest, and any MDD (i.e., mild, moderate, or severe MDD) versus no MDD. The results showed that the IRT-Theta-based method yielded the highest agreement (98.04%) with the gold standard classification using the HRSD-17. The findings from four phases study indicated that IRT-based approach which accounted for the discrimination and threshold parameters and DIF between groups provided a higher precision of classification the severity of depressive symptoms than the traditional CTT approach. However, the study to establish the criterion-related validity of the revised 9Q in the northern Thai dialect was conducted on a northern Thai population with only a few severe depressive symptom cases. Hence, the IRT parameter estimation for some categories might have introduced bias. Referencing these findings should be caution. A further study should consider for the fit indices of model when using in the different settings.	en_US
dc.language.iso	en	en_US
dc.publisher	Chiang Mai : Graduate School, Chiang Mai University	en_US
dc.title	Application of item response theory to improve the score received from the nine-questions depression-rating scale in the Northern Thai dialect	en_US
dc.title.alternative	การประยุกต์ทฤษฎีการตอบสนองข้อสอบเพื่อปรับปรุงคะแนนที่ได้จากแบบประเมินอาการซึมเศร้า 9 คำถามฉบับภาษาเหนือ	en_US
dc.type	Thesis
thailis.controlvocab.lcsh	Depression, Mental	-
thailis.controlvocab.lcsh	Depression, Mental -- Response rate	-
thailis.controlvocab.lcsh	Patient Health Questionnaire	-
thailis.controlvocab.lcsh	Depression, Mental -- Evaluation	-
thesis.degree	doctoral	en_US
thesis.description.thaiAbstract	ภาวะซึมเศร้าเป็นอาการทางจิตเวชที่เป็นปัญหาทางสาธารณสุขที่สำคัญและอาจนำไปสู่ปัญหา การทำร้ายตนเอง หรือการฆ่าตัวตาย ในปัจจุบันที่ใช้ในการคัดกรองหรือวินิจฉัยภาวะซึมเศร้าหลายฉบับ เช่น แบบประเมิน Patient Health Questionnaire (PHQ-9) แบบประเมิน Hamilton Rating Scale for Depression (HRSD-17) ฯลฯ ในส่วนของแบบประเมิน PHQ-9 นั้น เป็นการประเมินความถี่ของการเกิดอาการซึมเศร้าในช่วงสองสัปดาห์ผ่านข้อคำถามจำนวน 9 ข้อ ซึ่งเป็นการประเมินจากความถี่เพียงอย่างเดียว จึงได้มีการสร้างแบบประเมินอาการซึมเศร้าขึ้นใหม่ซึ่งเพิ่มการพิจารณาความรุนแรงของแต่ละอาการนอกเหนือจากความถี่ด้วย นั่นคือ แบบประเมิน Nine-Questions Depression-Rating Scale (9Q) และได้มีการทดสอบคุณสมบัติในกลุ่มประชากรในภาคกลางของไทย ซึ่งในระยะต่อมาได้พัฒนาแบบประเมินฉบับภาษาเหนือและทำการเก็บข้อมูลและประเมินประชากรทั่วไปในพื้นที่ภาคเหนือ แม้ว่าแบบประเมิน 9Q จะสามารถใช้ในการประเมินอาการซึมเศร้าและจำแนกระดับความรุนแรงของอาการซึมเศร้าได้ค่อนข้างดี แต่การพิจารณาจากคะแนนรวมของแต่ละอาการในระดับที่เท่ากันตามวิธีดั้งเดิม (Classical test theory: CTT) อาจไม่สะท้อนถึงระดับอาการของโรคที่แท้จริง และมีโอกาสทำให้ผลการคัดกรองเกิดความผิดพลาดได้ ดังนั้นผู้วิจัยจึงต้องการที่จะทำการศึกษาเพื่อหาแนวทางในการคำนวณคะแนนแบบใหม่ โดยประยุกต์ทฤษฎีการตอบสนองข้อสอบ (Item response theory: IRT) ที่มีการพิจารณาน้ำหนักของข้อคำถามและตัวเลือกในแต่ละข้อมาประยุกต์ใช้ในการคำนวณ และหาเกณฑ์คะแนนที่เหมาะสมสำหรับจำแนกระดับความรุนแรงของภาวะซึมเศร้าภายใต้การคิดคะแนนแบบ IRT นี้ด้วย โดยมีสมมติฐานว่าการคิดคะแนนแบบ IRT นี้จะให้ผลการคัดกรองที่มีความแม่นตรงมากกว่าวิธี CTT ผู้วิจัยทำการศึกษาโดยใช้ข้อมูลทุติยภูมิจากงานวิจัยที่ใช้แบบประเมิน 9Q ฉบับภาษาเหนือ ที่เก็บรวบรวมข้อมูลจากประชากรทั่วไปในภาคเหนือที่มีอายุตั้งแต่ 13 ปี ขึ้นไป จำนวน 1,527 ราย โดยผู้ที่เข้าร่วมการวิจัยจะถูกสอบถามเกี่ยวกับข้อมูลส่วนตัว คัดกรองและประเมินอากรซึมเศร้าด้วยแบบประเมิน 9Q และ HRSD-17 จากนั้นจะถูกส่งต่อให้แพทย์ทำการวินิจฉัยยืนยันอาการซึมเศร้าต่อไป โดยในการศึกษาครั้งนี้ ผู้วิจัยแบ่งการดำเนินการออกเป็น 4 ระยะ ดังนี้ ระยะที่ 1 ประเมินความแตกต่างของการตอบสนองต่อข้อคำถาม (Different item functioning: DIF) ของแบบประเมิน 9Q ว่าในแต่ละข้อคำถามมีความแตกต่างกันระหว่างกลุ่มของลักษณะประชากร ได้แก่ เพศ อายุ การมีโรคประจำตัว และรายได้ โดยได้ทำการประเมินตัวแบบ IRT ที่เหมาะสมในกลุ่มตัวอย่างจำนวน 1,475 รายที่ตอบแบบประเมินครบทุกข้อก่อนและพบว่าตัวแบบ Generalized partial credit model (GPCM) คือตัวแบบที่เหมาะสมในการศึกษา จากนั้นทำการประเมิน DIF ของแต่ละข้อคำถามด้วยวิธีการวิเคราะห์ IRT แยกตามกลุ่ม และวิธีการวิเคราะห์การถดถอยโลจิสติกเชิงอันดับ (Ordinal logistic regression: OLR) ซึ่งจากการประเมินทั้งสองเทคนิคพบว่าบางข้อคำถามมี DIF ระหว่างกลุ่มเพศ อายุ และการมีโรคประจำตัว แต่ในกลุ่มรายได้พบเฉพาะในวิธี OLR ซึ่งจะเห็นว่าทั้งสองวิธีนี้ให้ผลลัพธ์ที่แตกต่างกัน อาจเป็นผลเนื่องมาจากวิธี IRT มีการพิจารณาค่าพารามิเตอร์ที่ต่างกันในแต่ละตัวเลือกและแต่ละข้อด้วย ดังนั้นผลจากการศึกษาในระยะนี้จึงสนับสนุนแนวคิดในการประเมิน DIF ด้วยวิธี IRT ที่อาจช่วยลดอคติจากการประมาณค่าได้ ระยะที่ 2 ทำการวิเคราะห์โครงสร้างและองค์ประกอบของแบบประเมิน 9Q เพื่อพิจารณาตัวแบบที่เหมาะสมระหว่างตัวแบบหนึ่งที่เป็นตัวแบบพื้นฐานที่พิจาณาข้อคำถามทั้ง 9 เป็นองค์ประกอบเดียว เทียบกับตัวแบบสององค์ประกอบที่มีงานวิจัยอื่น ๆ นำเสนอไว้โดยใช้การวิเคราะห์องค์ประกอบเชิงยืนยัน (Confirmatory factor analysis) รวมถึงตัวแบบที่ได้จากการวิเคราะห์องค์ประกอบเชิงสำรวจ (Exploratory factor analysis) ในกลุ่มตัวอย่างอายุ 19 ปีขึ้นไปที่ตอบแบบประเมิน 9Q และไม่มีประวัติอาการทางจิตเวช จำนวน 1,346 ราย เมื่อได้ตัวแบบที่เหมาะสมแล้วจึงทำการวิเคราะห์โมเดลมิมิค(Multiple Indicators and Multiple Causes: MIMIC) เพื่อพิจารณาความสัมพันธ์ของตัวแปรต้นร่วมกับองค์ประกอบที่พิจารณาในตัวแบบโครงสร้าง โดยตัวแปรต้นที่พิจารณาในการศึกษานี้ได้แก่ เพศ อายุ สภานภาพสมรส รายได้ ระดับการศึกษา อาชีพ และโรคประจำตัวต่าง ๆ จากการวิเคราะห์ EFA พบว่าตัวแบบสององค์ประกอบเหมาะสมในการอธิบายอาการซึมเศร้าในกลุ่มตัวอย่างมากกว่าตัวแบบหนึ่งองค์ประกอบที่เป็นตัวแบบพื้นฐาน ประกอบด้วยองค์ประกอบของอาการทางความคิดและอารมณ์ (Cognitive-affective) จำนวน 6 ข้อ และองค์ประกอบของอาการทางกาย (Somatic) จำนวน 3 ข้อ (RMSEA = 0.077, CFI = 0.953, TLI = 0.936) โดยทั้งสององค์ประกอบมีความแปรปรวนร่วมกัน (r = 0.771) และเมื่อพิจารณาเปรียบเทียบกับตัวแบบสององค์ประกอบอื่น ๆ ที่มีการเสนอไว้ พบว่าตัวแบบที่ได้จากผล EFA เหมาะสมในการใช้อธิบายมากกว่า จากนั้นเมื่อนำโครงสร้างองค์ประกอบนี้เข้าสู่การวิเคราะห์ความสัมพันธ์ร่วมกับตัวแปรต้นอื่น ๆ ด้วยการวิเคราะห์โมเดลมิมิคพบว่าภาวะไขมันในเลือดสูงมีความสัมพันธ์ในเชิงบวกกับทั้งอาการทางความคิดและอารมณ์ (β = 0.120) และอาการซึมเศร้าทางกาย (β = 0.080) โรคภูมิแพ้แพ้มีความสัมพันธ์กับระดับอาการซึมเศร้าทางความคิดและอารมณ์ที่สูงขึ้น (β = 0.087) ในขณะที่ไมเกรน (β = 0.114) และโรคแผลในกระเพาะอาหาร (β = 0.062) มีความสัมพันธ์กับระดับอาการทางกายที่สูงขึ้น นอกจากนี้พบว่าอายุที่มากขึ้นมีความสัมพันธ์กับระดับอาการทางกายที่ลดลง (β = -0.088) ผลของการศึกษาในระยะนี้ชี้ให้เห็นว่าว่าการพิจารณาอาการซึมเศร้าเป็นสององค์ประกอบจะให้ผลดีกว่าในการประเมินอาการซึมเศร้า และควรติดตามการเกิดโรคประจำตัวอื่น ๆ ที่มีความสัมพันธ์ร่วมกับการประเมินอาการซึมเศร้าเพื่อการเฝ้าระวังที่เหมาะสม ระยะที่ 3 พัฒนาตัวแบบ IRT ที่เหมาะสมในการคำนวณคะแนนใหม่สำหรับแบบประเมิน 9Q โดยในระยะนี้จะทำการพิจารณาข้อตกลงของตัวแบบ IRT ที่เหมาะสมแยกตามอายุก่อน ซึ่งพบว่ากลุ่มตัวอย่างที่เป็นวัยรุ่นอายุน้อยกว่า 19 ปี ไม่เหมาะที่จะใช้ตัวแบบ IRT ในการอธิบาย จึงพิจารณาตัดตัวอย่างกลุ่มดังกล่าวออกจากการศึกษา คงเหลือกลุ่มตัวอย่างในระยะนี้ 1,355 ราย ซึ่งจะทำการสุ่มเพื่อแบ่งเป็น 2 กลุ่ม ได้แก่ กลุ่มสำหรับพัฒนาตัวแบบการให้คะแนน 1,000 ราย และกลุ่มสำหรับตรวจสอบประสิทธิภาพของตัวแบบ 355 ราย ทั้งนี้ในการศึกษาระยะนี้จะทำการวิเคราะห์อาการซึมเศร้าแบบหนึ่งองค์ประกอบเท่านั้น เนื่องจากข้อจำกัดในการวิเคราะห์ IRT แบบหลายองค์ประกอบ โดยในการพิจารณาเลือกตัวแบบ IRT ในกลุ่มสำหรับพัฒนานั้นพบว่าตัวแบบที่เหมาะสมคือตัวแบบ Graded response model (GRM) ที่มีการพิจารณาค่าพารามิเตอร์แยกตามข้อคำถาม (Discrimination parameters) และค่าพารามิเตอร์แยกตามลำดับตัวเลือกในแต่ละข้อ (Threshold parameters) นอกจากนี้ผู้วิจัยได้เลือกใช้ตัวแบบ Nominal response model (NRM) ซึ่งเป็นตัวแบบที่พิจารณาว่าแต่ละตัวเลือกของแบบสอบถามเป็นอิสระกัน ไม่เรียงตามลำดับคะแนน เพื่อพิจารณาร่วมด้วย นอกจากนี้ยังวิเคราะห์เพื่อดู DIF ระหว่างกลุ่มเพศร่วมด้วย จากนั้นทำการสร้างตัวแบบคะแนนทั้งหมด 4 ตัวแบบ ได้แก่ ตัวแบบ GRM ที่ไม่พิจารณา DIF (9Q-GRM) ตัวแบบ GRM ที่พิจารณา DIF ระหว่างเพศ (9Q-GRM-DIF) ตัวแบบ NRM ที่ไม่พิจารณา DIF (9Q-NRM) และตัวแบบ NRM ที่พิจารณา DIF ระหว่างเพศ (9Q-NRM-DIF) ซึ่งผลการศึกษาชี้ให้เห็นว่าตัวแบบ 9Q-GRM-DIF มีความแม่นยำในการจำแนกความรุนแรงของอาการซึมเศร้าสูงกว่าคะแนนรวมตามวิธีดั้งเดิมถึง 16.7% ซึ่งแสดงให้เห็นว่าการใช้วิธี IRT ในการคำนวณคะแนนร่วมกับการพิจารณา DIF ระหว่างกลุ่มเพศช่วยให้การคำนวณคะแนนแบบใหม่มีความแม่นยำมากขึ้น ระยะที่ 4 หาจุดตัดที่เหมาะสมสำหรับการสร้างเกณฑ์จำแนกระดับความรุนแรงของอาการซึมเศร้าจากคะแนนแบบ IRT ที่ได้จากระยะที่ 3 โดยกลุ่มตัวอย่างจะแบ่งเป็น 2 กลุ่ม ได้แก่ กลุ่มสำหรับพัฒนา และกลุ่มสำหรับตรวจสอบ กลุ่มละ 1,000 และ 355 ราย เช่นเดียวกับระยะที่ 3 โดยจะทำการหาจุดตัดที่เหมาะสมด้วยวิธีของ Liu หรือวิธีของ Yuden ที่พิจารณาจากค่าความไว (Sensitivity) และความจำเพาะ (Specificity) ของเส้นโค้ง Receiver operating characteristic (ROC) ร่วมกัน เปรียบเทียบกับวิธี IRT ที่อ้างอิงค่าพารามิเตอร์ของตัวแปรแฝง (Theta parameter) ที่แบ่งตามความชุกของความรุนแรงของอาการซึมเศร้าในแต่ระดับ ซึ่งได้จากการสร้างตัวแปรจำลอง (Dummy variables) สำหรับจำแนกความรุนแรงของอาการซึมเศร้าเป็น 3 ตัวแปร ได้แก่ อาการรุนแรงมาก (เทียบกับคนที่ไม่มีอาการจนถึงมีมีอาการรุนแรงปานกลาง) อาการรุนแรงปานกลางขึ้น (เทียบกับคนที่ไม่มีอาการหรือมีอาการเล็กน้อย) และอาการรุนแรงเล็กน้อย (เทียบกับกลุ่มที่ไม่มีอาการ) เพื่อสร้างจุดตัดสำหรับแต่ละระดับความรุนแรง ผลการศึกษาพบว่าแสดงให้เห็นว่าค่าจุดตัดความรุนแรงของอาการซึมเศร้าที่ได้จากวิธีแบบ IRT ที่อ้างอิงพารามิเตอร์ของตัวแปรแฝงหรือ Theta มีความสอดคล้องในการแบ่งกลุ่มคะแนนจากแบบประเมิน HRSD-17 มากที่สุด (98.03%) จากผลการวิเคราะห์ทั้ง 4 ระยะ ชี้ให้เห็นว่าการคำนวณคะแนนด้วยวิธี IRT ที่ให้น้ำหนักของการคำนวณคะแนนแตกต่างกันระหว่างข้อและระหว่างตัวเลือกในแต่ละข้อร่วมกับการพิจารณาความแตกต่างในการตอบข้อคำถามระหว่างกลุ่มด้วย DIF มีประสิทธิภาพในการจำแนกความรุนแรงของอาการซึมเศร้ามากกว่าวิธีการแบบดั้งเดิม แต่อย่างไรก็ตาม เนื่องจากข้อมูลที่ใช้ในการศึกษาครั้งนี้มีกลุ่มตัวอย่างที่มีอาการในระดับรุนแรงค่อนข้างน้อย การประเมินค่าด้วยวิธี IRT อาจมีความเอนเอียงสำหรับตัวอย่างกลุ่มดังกล่าว การอ้างอิงผลจึงควรเป็นไปด้วยความระมัดระวัง รวมถึงควรต้องพิจารณาความเหมาะสมของตัวแบบเมื่อนำไปใช้ในกลุ่มตัวอย่างอื่น ๆ เพิ่มเติมด้วย	en_US
Appears in Collections:	SCIENCE: Theses

Files in This Item:

File	Description	Size	Format
Thesis_620551009_Suttipong Kawilapat.pdf	Thesis	1.36 MB	Adobe PDF	View/Open Request a copy

Show simple item record