A method for k-means-like clustering of categorical data

Thu Hien Thi Nguyen; Duy Tai Dinh; Songsak Sriboonchitta; Van Nam Huynh

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/67757

Full metadata record

DC Field	Value	Language
dc.contributor.author	Thu Hien Thi Nguyen	en_US
dc.contributor.author	Duy Tai Dinh	en_US
dc.contributor.author	Songsak Sriboonchitta	en_US
dc.contributor.author	Van Nam Huynh	en_US
dc.date.accessioned	2020-04-02T15:02:51Z	-
dc.date.available	2020-04-02T15:02:51Z	-
dc.date.issued	2019-01-01	en_US
dc.identifier.issn	18685145	en_US
dc.identifier.issn	18685137	en_US
dc.identifier.other	2-s2.0-85073982951	en_US
dc.identifier.other	10.1007/s12652-019-01445-5	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85073982951&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/67757	-
dc.description.abstract	© 2019, Springer-Verlag GmbH Germany, part of Springer Nature. Despite recent efforts, the challenge in clustering categorical and mixed data in the context of big data still remains due to the lack of inherently meaningful measure of similarity between categorical objects and the high computational complexity of existing clustering techniques. While k-means method is well known for its efficiency in clustering large data sets, working only on numerical data prohibits it from being applied for clustering categorical data. In this paper, we aim to develop a novel extension of k-means method for clustering categorical data, making use of an information theoretic-based dissimilarity measure and a kernel-based method for representation of cluster means for categorical objects. Such an approach allows us to formulate the problem of clustering categorical data in the fashion similar to k-means clustering, while a kernel-based definition of centers also provides an interpretation of cluster means being consistent with the statistical interpretation of the cluster means for numerical data. In order to demonstrate the performance of the new clustering method, a series of experiments on real datasets from UCI Machine Learning Repository are conducted and the obtained results are compared with several previously developed algorithms for clustering categorical data.	en_US
dc.subject	Computer Science	en_US
dc.title	A method for k-means-like clustering of categorical data	en_US
dc.type	Journal	en_US
article.title.sourcetitle	Journal of Ambient Intelligence and Humanized Computing	en_US
article.stream.affiliations	Japan Advanced Institute of Science and Technology	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record