Please use this identifier to cite or link to this item:
Title: A Performance Comparison of Supervised Classifiers and Deep-learning Approaches for Predicting Toxicity in Thai Tweets
Authors: Pree Thiengburanathum
Phasit Charoenkwan
Authors: Pree Thiengburanathum
Phasit Charoenkwan
Keywords: Arts and Humanities;Computer Science;Engineering
Issue Date: 3-Mar-2021
Abstract: There are numerous tweeter user accounts in Thailand and many toxic comments are being generated every day on this platform. Sentimental Analysis can be used as a tool to identify toxic comments. In this study, two feature extraction techniques, including Bag of Words (BOW) and Term frequency-inverse document (TF-IDF), were investigated. Additionally, the performance of ten well-known traditional classifiers, along with three deep-learning approaches including Convolutional Neural Network (CNN), Long-short-Term memory (LSTM) and pretrained Bidirectional Encoder Representations (BERT), were compared with the public Toxicity Thai tweeter corpus the experiments reveal that by combining Bag of Words (BOW) with the Extra-Tree classifier, researchers were able to archive the highest F1-score of 0.72, classification accuracy rate of 72.27% and AUC value of 0.77 using the test set in contrast to other classifiers and other deep-learning techniques. Feature importance, correlation and impacts were also investigated through the use of SHapley Additive exPlanations (SHAP) diagram.
Appears in Collections:CMUL: Journal Articles

Files in This Item:
There are no files associated with this item.

Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.