Vietnamese Handwriting Recognition Project
Published:
We present our efforts to create a database of unconstrained Vietnamese online handwritten text sampled from pen-based devices. The database stores handwritten text for paragraphs, lines, words, and characters, with the ground truth associated with every paragraph and line. We show a detailed statistical analysis of the handwritten text in this database and describe recognition experiments using several recent methods including the Bidirectional Long Short-Term Memory (BLSTM) network. Overall, our database contains over 480,000 strokes from more than 380,000 characters, which, at present, is the largest database of Vietnamese online handwritten text. Although Vietnamese script is based on a fixed set of alphabet letters, the recognition of Vietnamese online handwritten text poses a difficult challenge because of many diacritical marks, which usually result in delayed strokes during writing. We designed and implemented an online handwriting-collection tool to gather data, as well as a line-segmentation tool and a delayed-stroke-detection tool to analyze collected handwritten text. We also conducted a statistical analysis based on the writer profiles. We applied a number of the state-of-the-art recognition methods on unconstrained Vietnamese handwriting to evaluate their performance, including the BLSTM network, which is an efficient architecture derived from the Recurrent Neural Network (RNN) and is often applied to sequence labeling problems. The BLSTM network achieved 90% character recognition accuracy, despite many long sequences with several delayed strokes. Our database is allowed open access for research to stimulate the development of handwriting research technology.
We also present the results of the VOHTR 2018 competition on Vietnamese Online Handwritten Text Recognition. The goal of this competition is to evaluate and compare recent online handwritten text recognition systems on Vietnamese online handwritten text which contains many delayed strokes caused by the diacritic marks. Besides, the general objective is to encourage the studies on Vietnamese online handwritten text recognition based on the large Vietnamese handwriting database collected from 200 writers. In this competition, we introduce three tasks consisting of word recognition (task 1), text-line recognition (task 2) and paragraph recognition (task 3) which are described in details. Subsequently, we describe the evaluation metrics and give comparative results of competitors along with the brief descriptions of the respective methods.
Access to database, competition and paper.