Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

Papangkorn Inkeaw; Piyachat Udomwong; Jeerayut Chaijaruwanich

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/75881

Title:	Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition
Authors:	Papangkorn Inkeaw Piyachat Udomwong Jeerayut Chaijaruwanich
Authors:	Papangkorn Inkeaw Piyachat Udomwong Jeerayut Chaijaruwanich
Keywords:	Business, Management and Accounting;Computer Science;Decision Sciences
Issue Date:	23-May-2021
Abstract:	A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semi-automatically generated training dataset is comparable with that classifier trained by actual ground truth.
URI:	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102878142&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/75881
ISSN:	09507051
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show full item record