Instance reduction for supervised learning using input-output clustering method

Anusorn Yodjaiphet; Nipon Theera-Umpon; Sansanee Auephanwiriyakul

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/54470

Full metadata record

DC Field	Value	Language
dc.contributor.author	Anusorn Yodjaiphet	en_US
dc.contributor.author	Nipon Theera-Umpon	en_US
dc.contributor.author	Sansanee Auephanwiriyakul	en_US
dc.date.accessioned	2018-09-04T10:14:13Z	-
dc.date.available	2018-09-04T10:14:13Z	-
dc.date.issued	2015-12-01	en_US
dc.identifier.issn	22275223	en_US
dc.identifier.issn	20952899	en_US
dc.identifier.other	2-s2.0-84949987838	en_US
dc.identifier.other	10.1007/s11771-015-3026-4	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84949987838&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/54470	-
dc.description.abstract	© 2015, Central South University Press and Springer-Verlag Berlin Heidelberg. A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data. Then, a set of prototypes are selected from the clustered input data. The inessential data can be ultimately discarded from the data set. The proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems. Two standard synthetic data sets and three standard real-world data sets are used for evaluation. The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets. The numbers of instances of the synthetic data sets are decreased by 25%-69%. The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. The reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of the redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from the corresponding original data sets. Therefore, the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.	en_US
dc.subject	Engineering	en_US
dc.subject	Materials Science	en_US
dc.title	Instance reduction for supervised learning using input-output clustering method	en_US
dc.type	Journal	en_US
article.title.sourcetitle	Journal of Central South University	en_US
article.volume	22	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record