Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

Papangkorn Inkeaw; Piyachat Udomwong; Jeerayut Chaijaruwanich

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/75881

Full metadata record

DC Field	Value	Language
dc.contributor.author	Papangkorn Inkeaw	en_US
dc.contributor.author	Piyachat Udomwong	en_US
dc.contributor.author	Jeerayut Chaijaruwanich	en_US
dc.date.accessioned	2022-10-16T07:03:24Z	-
dc.date.available	2022-10-16T07:03:24Z	-
dc.date.issued	2021-05-23	en_US
dc.identifier.issn	09507051	en_US
dc.identifier.other	2-s2.0-85102878142	en_US
dc.identifier.other	10.1016/j.knosys.2021.106953	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102878142&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/75881	-
dc.description.abstract	A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semi-automatically generated training dataset is comparable with that classifier trained by actual ground truth.	en_US
dc.subject	Business, Management and Accounting	en_US
dc.subject	Computer Science	en_US
dc.subject	Decision Sciences	en_US
dc.title	Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition	en_US
dc.type	Journal	en_US
article.title.sourcetitle	Knowledge-Based Systems	en_US
article.volume	220	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record