Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/72914
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKhishigsuren Davagdorjen_US
dc.contributor.authorKwang Ho Parken_US
dc.contributor.authorTsatsral Amarbayasgalanen_US
dc.contributor.authorLkhagvadorj Munkhdalaien_US
dc.contributor.authorLing Wangen_US
dc.contributor.authorMeijing Lien_US
dc.contributor.authorKeun Ho Ryuen_US
dc.date.accessioned2022-05-27T08:31:50Z-
dc.date.available2022-05-27T08:31:50Z-
dc.date.issued2022-01-01en_US
dc.identifier.issn18761119en_US
dc.identifier.issn18761100en_US
dc.identifier.other2-s2.0-85123298081en_US
dc.identifier.other10.1007/978-981-16-8430-2_17en_US
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85123298081&origin=inwarden_US
dc.identifier.urihttp://cmuir.cmu.ac.th/jspui/handle/6653943832/72914-
dc.description.abstractThe large volumes of biomedical documents have been generating exponentially in modern applications. Document clustering methods play an important role in gathering textual content documents into a few meaningful coherent groups. However, clustering unstructured and unlabeled text is challenging to extract informative representations and find the relevant articles from the rapid growth biomedical literature. Therefore, traditional text document clustering methods often represent unsatisfactory results due to general non-contextualized vector space representations, which neglect the semantic relation between bio medical texts. Pre-trained language models have been gaining attention recently in variety of natural language processing tasks. In this paper, we propose a heavily pre-trained language representation BioBERT based clustering framework for biomedical document analysis in order to improve the clustering accuracy. In experimental architecture, we provide benchmarks of the pre-trained transformer model, statistical technique and word-embedding methods while incorporating with clustering algorithms. In order to distinguish the efficiency of the models, Fowlkes mallows score (FM), silhouette coefficient (SC), adjusted rand index (ARI), Davies-Bouldin score (DB) metrics are used. The comprehensive experimental results show that the BioBERT based K-means model achieves better clustering accuracies than other models.en_US
dc.subjectEngineeringen_US
dc.titleBioBERT Based Efficient Clustering Framework for Biomedical Document Analysisen_US
dc.typeBook Seriesen_US
article.title.sourcetitleLecture Notes in Electrical Engineeringen_US
article.volume833 LNEEen_US
article.stream.affiliationsTon-Duc-Thang Universityen_US
article.stream.affiliationsNortheast China Institute of Electric Power Engineeringen_US
article.stream.affiliationsShanghai Maritime Universityen_US
article.stream.affiliationsChungbuk National Universityen_US
article.stream.affiliationsChiang Mai Universityen_US
Appears in Collections:CMUL: Journal Articles

Files in This Item:
There are no files associated with this item.


Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.