Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/80003
Title: Application of item response theory to improve the score received from the nine-questions depression-rating scale in the Northern Thai dialect
Other Titles: การประยุกต์ทฤษฎีการตอบสนองข้อสอบเพื่อปรับปรุงคะแนนที่ได้จากแบบประเมินอาการซึมเศร้า 9 คำถามฉบับภาษาเหนือ
Authors: Suttipong Kawilapat
Authors: Patrinee Traisathit
Sukon Prasitwattanaseree
Benchalak Maneeton
Narong Maneeton
Suttipong Kawilapat
Issue Date: 25-Jun-2024
Publisher: Chiang Mai : Graduate School, Chiang Mai University
Abstract: Depressive disorders, among the most common mental disorders, are a leading cause of the global disease burden, self-harm, and deaths by suicide. The measurement tools were established to screen or diagnose depressive disorder such as the Patient Health Questionnaire (PHQ-9), Hamilton Rating Scale for Depression (HRSD-17), etc. Since the assessment using the PHQ-9 was accounted only for the frequencies of nine depressive symptoms in previous two weeks, the Nine-Questions Depression-Rating Scale (9Q) tool was developed which accounts for both frequencies and severities of each symptom. The 9Q tool was first established among the participants in the central region of Thailand and then developed to the northern Thai dialect version to establish in the northern region. Although the 9Q is appropriate to use in assessment for the depressive symptoms severity, the standard sum scoring which equally summed from all items based on the Classical test theory (CTT) might not well represent the true trait level of depressive symptoms and might lead to the bias in assessment. Therefore, we would like to develop a new scoring method based on the Item response theory (IRT) which takes the different weights from each item and each category into account. The optimal cut-off points of the depressive symptom severity for the proposed scoring method based on the IRT approach were also considered. It has been hypothesized that applying IRT parameters as the weighted parameters for weighted sum scoring could be beneficial for mitigating this issue than the CTT approach. Secondary data from a study on the criterion-related validity of a revised 9Q in the northern Thai dialect comprising 1,527 individuals among the participants aged 13 or more in the northern region of Thailand were used in the study. All participants were first interviewed to obtain their demographic data and screened for depression using the 9Q, assessed for depression using HRSD-17 as the gold standard, and then diagnosed by physicians for depressive symptoms. A four-phase analysis comprising the following steps was conducted: Phase 1: This was conducted to explore any differential item functioning (DIF) for each item of the 9Q related to the characteristics, e.g., gender, age, chronic diseases, and income. The determination of appropriate IRT model were conducted among 1,475 participants who endorsed all items of the 9Q. The generalized partial credit model (GPCM) was the most appropriate model for this phase. we considered using both IRT-based and ordinal logistic regression (OLR)-based approaches to detect the DIF between groups. We also compared and reviewed the findings retrieved from both approaches. The results showed that the DIF detection from the IRT- and OLR-based approaches were different. It might be due to the discrimination and threshold parameters which were accounting in the IRT-based approach. Scoring of the discrimination and threshold parameters across characteristics based on the IRT approach might be useful for reducing bias in depression measurement. Phase 2: This phase investigated the factor structure of the 9Q tool to find the best fitted model for assess the depressive symptoms between one- and two-factor model which provided in the previous studies using confirmatory factor analysis (CFA) and the model proposed by the exploratory factor analysis (EFA) among 1,346 participants aged 19 years or more who attended in this study and did not experience with any psychiatric disorders. The relationship between characteristics of the participants (i.e., gender, age, marital status, income, educational level, occupation, and chronic diseases) to the study model were explored using the Multiple Indicators Multiple Causes (MIMIC) model. According to the EFA, a two-factor model separated into a cognitive-affective dimension (6 items) and a somatic dimension (3 items) which correlated with each other (r = 0.771) (RMSEA = 0.077, CFI = 0.953, TLI = 0.936). The CFA results indicate that an EFA model with two factors provided better fit index values than the previously published two-factor models. According to a MIMIC model, dyslipidemia was positively associated with both cognitive-affective symptoms (β = 0.120) and somatic depressive symptoms (β = 0.080). Allergies were associated with a higher level of cognitive-affective depressive symptoms (β = 0.087), while migraine (β = 0.114) and peptic ulcer disease (β = 0.062) were associated with a higher level of somatic symptoms. Increased age was associated with a lower level of somatic symptoms (β = -0.088). The findings illustrate that considering depressive symptoms as two dimensions yields a better fit for depressive symptoms and that using a multidimensional IRT model is beneficial for more precise scoring using the 9Q. Phase 3: An IRT-based weighted sum scoring approach was developed to provide a new scoring method for the 9Q. The assumptions of IRT model were separately considered for each age group and found that the participants aged lower than 19 years were not fitted for using unidimensional IRT model and excluded from this phase. Of the 1,355 participants included in the study, 1,000 and 355 participants were randomly selected for the developmental and validation groups for IRT-based weighted scoring, respectively. The IRT models considered for this phase were only the one-factor model regarding the unidimensionality assumption. According to the model selection, the graded response model (GRM), which accounts for the discrimination parameters for each item and threshold parameters for order categories in each item, were used in this phase. The nominal response model (NRM) which independently estimated the category parameters was also considered in this phase. The scoring model considered in this study included the GRM model (9Q-GRM), the GRM model accounting for DIF (9Q-GRM-DIF), the NRM model (9Q-NRM), and the NRM model accounting for DIF (9Q-NRM-DIF). The results showed that the 9Q-GRM model accounting for DIF had a higher precision (16.7%) than the traditional sum-score approach. The findings suggest that weighted sum scoring with IRT parameters accounting for DIF between genders improves the precision of scoring. Phase 4: The aim of this phase was identifying optimal cut-off points for severity of depressive symptoms based on the IRT-based weighted scoring. The participants were separated into 2 groups, developmental (n = 1,000) and validation (n = 355) groups, like Phase 3. The Liu’s or Yuden’s methods which considered the sensitivity and specificity from the receiver operating characteristic (ROC) were applied in this phase compared with the IRT-based method which related to the theta parameter accounting for the prevalence of each severity. Three dummy variables of major depressive disorder (MDD) severity were defined and used in the determination including severe MDD versus the rest, moderate-and-severe MDD versus the rest, and any MDD (i.e., mild, moderate, or severe MDD) versus no MDD. The results showed that the IRT-Theta-based method yielded the highest agreement (98.04%) with the gold standard classification using the HRSD-17. The findings from four phases study indicated that IRT-based approach which accounted for the discrimination and threshold parameters and DIF between groups provided a higher precision of classification the severity of depressive symptoms than the traditional CTT approach. However, the study to establish the criterion-related validity of the revised 9Q in the northern Thai dialect was conducted on a northern Thai population with only a few severe depressive symptom cases. Hence, the IRT parameter estimation for some categories might have introduced bias. Referencing these findings should be caution. A further study should consider for the fit indices of model when using in the different settings.
URI: http://cmuir.cmu.ac.th/jspui/handle/6653943832/80003
Appears in Collections:SCIENCE: Theses

Files in This Item:
File Description SizeFormat 
Thesis_620551009_Suttipong Kawilapat.pdfThesis1.36 MBAdobe PDFView/Open    Request a copy


Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.