An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering

Meijing Li; Tianjie Chen; Keun Ho Ryu; Cheng Hao Jin

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/75766

Full metadata record

DC Field	Value	Language
dc.contributor.author	Meijing Li	en_US
dc.contributor.author	Tianjie Chen	en_US
dc.contributor.author	Keun Ho Ryu	en_US
dc.contributor.author	Cheng Hao Jin	en_US
dc.date.accessioned	2022-10-16T07:02:34Z	-
dc.date.available	2022-10-16T07:02:34Z	-
dc.date.issued	2021-01-01	en_US
dc.identifier.issn	17486718	en_US
dc.identifier.issn	1748670X	en_US
dc.identifier.other	2-s2.0-85119969977	en_US
dc.identifier.other	10.1155/2021/7937573	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85119969977&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/75766	-
dc.description.abstract	Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.	en_US
dc.subject	Biochemistry, Genetics and Molecular Biology	en_US
dc.subject	Immunology and Microbiology	en_US
dc.subject	Mathematics	en_US
dc.title	An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering	en_US
dc.type	Journal	en_US
article.title.sourcetitle	Computational and Mathematical Methods in Medicine	en_US
article.volume	2021	en_US
article.stream.affiliations	Ton-Duc-Thang University	en_US
article.stream.affiliations	Shanghai Maritime University	en_US
article.stream.affiliations	Chungbuk National University	en_US
article.stream.affiliations	Chiang Mai University	en_US
article.stream.affiliations	ENN Research Institute of Digital Technology	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record