Please use this identifier to cite or link to this item:
http://cmuir.cmu.ac.th/jspui/handle/6653943832/71882
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Pree Thiengburanathum | en_US |
dc.date.accessioned | 2021-01-27T04:17:00Z | - |
dc.date.available | 2021-01-27T04:17:00Z | - |
dc.date.issued | 2021-01-01 | en_US |
dc.identifier.issn | 21945365 | en_US |
dc.identifier.issn | 21945357 | en_US |
dc.identifier.other | 2-s2.0-85090049696 | en_US |
dc.identifier.other | 10.1007/978-3-030-57811-4_40 | en_US |
dc.identifier.uri | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090049696&origin=inward | en_US |
dc.identifier.uri | http://cmuir.cmu.ac.th/jspui/handle/6653943832/71882 | - |
dc.description.abstract | © 2021, Springer Nature Switzerland AG. In Natural Language Processing (NLP), the goal of sentence boundary detection (SBD) is to identify sentence boundaries in a phrase, paragraph, or document, which can be used in current NLP applications, including sentimental analysis, contextual chatbot, and machine translation, etc. Previous studies and existing NLP libraries often provide a straightforward approach to the task; for instance, they assume that a sentence always ends with certain punctuation symbols such as a period, a semicolon, a exclamation mark, or a question mark. The mentioned approach is impractical for other languages, such as Thai, where there is no symbol to designate where a sentence ends. With regard to developing an effective sentimental analysis or machine translation for the Thai language, a solid effort in detecting sentence boundary is needed. There is also as a need validating the SBD model against a real-world dataset, by involving the use of an online textual corpus. This paper attempts to compare Condition Random Fields (CRF) and Bidirectional Long-Short Term Memory with CRF layer (BiLSTM-CRF) on the online textual dataset. We scraped our own corpus from the top Thai web forums through the use of a Scrapy web-crawling framework. In the paper, 2,496 comments related to beauty product reviews were manually segmented by a Thai linguistic expert. Our experimental results revealed that the CRF based on the word-based labelling approach with widow size outperformed the BiLSTM-CRF. | en_US |
dc.subject | Computer Science | en_US |
dc.subject | Engineering | en_US |
dc.title | A Comparison of Thai Sentence Boundary Detection Approaches Using Online Product Review Data | en_US |
dc.type | Book Series | en_US |
article.title.sourcetitle | Advances in Intelligent Systems and Computing | en_US |
article.volume | 1264 AISC | en_US |
article.stream.affiliations | Chiang Mai University | en_US |
Appears in Collections: | CMUL: Journal Articles |
Files in This Item:
There are no files associated with this item.
Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.