Under-sampling by algorithm with performance guaranteed for class-imbalance problem

Wattana Jindaluang; Varin Chouvatut; Sanpawat Kantabutra

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/53415

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wattana Jindaluang	en_US
dc.contributor.author	Varin Chouvatut	en_US
dc.contributor.author	Sanpawat Kantabutra	en_US
dc.date.accessioned	2018-09-04T09:48:55Z	-
dc.date.available	2018-09-04T09:48:55Z	-
dc.date.issued	2014-01-01	en_US
dc.identifier.other	2-s2.0-84942909601	en_US
dc.identifier.other	10.1109/ICSEC.2014.6978197	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84942909601&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/53415	-
dc.description.abstract	© 2014 IEEE. Class-imbalance problem is the problem that the number, or data, in the majority class is much more than in the minority class. Traditional classifiers cannot sort out this problem because they focus on the data in the majority class than on the data in the minority class, and then they predict some upcoming data as the data in the majority class. Under-sampling is an efficient way to handle this problem because this method selects the representatives of the data in the majority class. For this reason, under-sampling occupies shorter training period than over-sampling. The only problem with the under-sampling method is that a representative selection, in all probability, throws away important information in a majority class. To overcome this problem, we propose a cluster-based under-sampling method. We use a clustering algorithm that is performance guaranteed, named k-centers algorithm, which clusters the data in the majority class and selects a number of representative data in many proportions, and then combines them with all the data in the minority class as a training set. In this paper, we compare our approach with k-means on five datasets from UCI with two classifiers: 5-nearest neighbors and c4.5 decision tree. The performance is measured by Precision, Recall, F-measure, and Accuracy. The experimental results show that our approach has higher measurements than the k-means approach, except Precision where both the approaches have the same rate.	en_US
dc.subject	Computer Science	en_US
dc.subject	Mathematics	en_US
dc.subject	Medicine	en_US
dc.title	Under-sampling by algorithm with performance guaranteed for class-imbalance problem	en_US
dc.type	Conference Proceeding	en_US
article.title.sourcetitle	2014 International Computer Science and Engineering Conference, ICSEC 2014	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record