A Performance Comparison of Supervised Classifiers and Deep-learning Approaches for Predicting Toxicity in Thai Tweets

Pree Thiengburanathum; Phasit Charoenkwan

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/75447

Title:	A Performance Comparison of Supervised Classifiers and Deep-learning Approaches for Predicting Toxicity in Thai Tweets
Authors:	Pree Thiengburanathum Phasit Charoenkwan
Authors:	Pree Thiengburanathum Phasit Charoenkwan
Keywords:	Arts and Humanities;Computer Science;Engineering
Issue Date:	3-Mar-2021
Abstract:	There are numerous tweeter user accounts in Thailand and many toxic comments are being generated every day on this platform. Sentimental Analysis can be used as a tool to identify toxic comments. In this study, two feature extraction techniques, including Bag of Words (BOW) and Term frequency-inverse document (TF-IDF), were investigated. Additionally, the performance of ten well-known traditional classifiers, along with three deep-learning approaches including Convolutional Neural Network (CNN), Long-short-Term memory (LSTM) and pretrained Bidirectional Encoder Representations (BERT), were compared with the public Toxicity Thai tweeter corpus the experiments reveal that by combining Bag of Words (BOW) with the Extra-Tree classifier, researchers were able to archive the highest F1-score of 0.72, classification accuracy rate of 72.27% and AUC value of 0.77 using the test set in contrast to other classifiers and other deep-learning techniques. Feature importance, correlation and impacts were also investigated through the use of SHapley Additive exPlanations (SHAP) diagram.
URI:	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85106616298&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/75447
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show full item record