BioBERT Based Efficient Clustering Framework for Biomedical Document Analysis

Khishigsuren Davagdorj; Kwang Ho Park; Tsatsral Amarbayasgalan; Lkhagvadorj Munkhdalai; Ling Wang; Meijing Li; Keun Ho Ryu

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/72914

Full metadata record

DC Field	Value	Language
dc.contributor.author	Khishigsuren Davagdorj	en_US
dc.contributor.author	Kwang Ho Park	en_US
dc.contributor.author	Tsatsral Amarbayasgalan	en_US
dc.contributor.author	Lkhagvadorj Munkhdalai	en_US
dc.contributor.author	Ling Wang	en_US
dc.contributor.author	Meijing Li	en_US
dc.contributor.author	Keun Ho Ryu	en_US
dc.date.accessioned	2022-05-27T08:31:50Z	-
dc.date.available	2022-05-27T08:31:50Z	-
dc.date.issued	2022-01-01	en_US
dc.identifier.issn	18761119	en_US
dc.identifier.issn	18761100	en_US
dc.identifier.other	2-s2.0-85123298081	en_US
dc.identifier.other	10.1007/978-981-16-8430-2_17	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85123298081&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/72914	-
dc.description.abstract	The large volumes of biomedical documents have been generating exponentially in modern applications. Document clustering methods play an important role in gathering textual content documents into a few meaningful coherent groups. However, clustering unstructured and unlabeled text is challenging to extract informative representations and find the relevant articles from the rapid growth biomedical literature. Therefore, traditional text document clustering methods often represent unsatisfactory results due to general non-contextualized vector space representations, which neglect the semantic relation between bio medical texts. Pre-trained language models have been gaining attention recently in variety of natural language processing tasks. In this paper, we propose a heavily pre-trained language representation BioBERT based clustering framework for biomedical document analysis in order to improve the clustering accuracy. In experimental architecture, we provide benchmarks of the pre-trained transformer model, statistical technique and word-embedding methods while incorporating with clustering algorithms. In order to distinguish the efficiency of the models, Fowlkes mallows score (FM), silhouette coefficient (SC), adjusted rand index (ARI), Davies-Bouldin score (DB) metrics are used. The comprehensive experimental results show that the BioBERT based K-means model achieves better clustering accuracies than other models.	en_US
dc.subject	Engineering	en_US
dc.title	BioBERT Based Efficient Clustering Framework for Biomedical Document Analysis	en_US
dc.type	Book Series	en_US
article.title.sourcetitle	Lecture Notes in Electrical Engineering	en_US
article.volume	833 LNEE	en_US
article.stream.affiliations	Ton-Duc-Thang University	en_US
article.stream.affiliations	Northeast China Institute of Electric Power Engineering	en_US
article.stream.affiliations	Shanghai Maritime University	en_US
article.stream.affiliations	Chungbuk National University	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record