Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/75881
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPapangkorn Inkeawen_US
dc.contributor.authorPiyachat Udomwongen_US
dc.contributor.authorJeerayut Chaijaruwanichen_US
dc.date.accessioned2022-10-16T07:03:24Z-
dc.date.available2022-10-16T07:03:24Z-
dc.date.issued2021-05-23en_US
dc.identifier.issn09507051en_US
dc.identifier.other2-s2.0-85102878142en_US
dc.identifier.other10.1016/j.knosys.2021.106953en_US
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102878142&origin=inwarden_US
dc.identifier.urihttp://cmuir.cmu.ac.th/jspui/handle/6653943832/75881-
dc.description.abstractA huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semi-automatically generated training dataset is comparable with that classifier trained by actual ground truth.en_US
dc.subjectBusiness, Management and Accountingen_US
dc.subjectComputer Scienceen_US
dc.subjectDecision Sciencesen_US
dc.titleDensity based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognitionen_US
dc.typeJournalen_US
article.title.sourcetitleKnowledge-Based Systemsen_US
article.volume220en_US
article.stream.affiliationsChiang Mai Universityen_US
Appears in Collections:CMUL: Journal Articles

Files in This Item:
There are no files associated with this item.


Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.