A contribution to automatic recognition of
handwritten Arabic scripts
Contributions and outline
The outline of this dissertation is presented below. We note that the content of each chapter corresponds to original contribution. In chapter 2, we will first review and classify the two-dimensional-based recognizers. Our goal is to clearly show the value of our contribution and to make the comparison with other related approaches easier for the reader to understand. We emphasize that the contents of this chapter offer, to our knowledge, the first complete overview of 2-D based recognition approaches [Boukerma et al., 2018b]. Therefore, the review part of this dissertation is far to be just a list of references. Below are more specific arguments: (i) We propose a classification of 2-D recognition approaches that have not been published in literature before. The proposed taxonomy involves two levels: Non-HMM and HMM-based approaches. For the second class, we have two sub-categories: MRF for 2-D modeling at the A-level and MRF for 2-D modeling at the B-level. (ii) In analyzing the existing works of HMM-based 2-D recognizers, we define a general criterion that considers the optimality issue of the training and decoding algorithms. We discuss the advantages and the weakness of the existing approaches in the category ”MRF for 2-D modeling at the A-level” and come to the conclusion that efficient algorithms to solve the three problems of HMM do not exist for the proposed 2-D models of this category. (iii) We introduce then the existing systems of the second category ”MRF for 2-D modeling at the B-level”. At this stage, we define another criterion to evaluate the existing works. This criterion considers the four drawbacks of the classical NSHP-HMM. We evaluate the existing systems in terms of their ability to overcome these four drawbacks. (iv) At this point of discussion, we introduce our model and demonstrate how it advances the state of the art by resolving the four mentioned drawbacks. Through chapter 3, we define in more detail the proposed model and derive their re-estimation formulas. We then provide a complete description of the principal steps involved in the image recognition system based on the NSHPZ -HMM. In order to evaluate our proposed model in ascending order of difficulty, the first series of experiments are performed on handwritten digits recognition [Boukerma et al., 2014] [Boukerma et al., 2018b]. In chapter 4, we consider the problem of Arabic handwritten word recognition [Boukerma et al., 2015]. We introduce a new zoning design approach based on base-line localization [Hanene et al., 2018]. Our goal is to appropriately partition the image of Arabic word into zones that facilitates its modelization using the NSHPZ – HMM. Through chapter 5, we introduce a hybrid NSHP-HMM combining pixel-based and zone-based observation probabilities [Boukerma et al., 2018a]. Despite the promising results of the hybrid model, further investigations are required. This is why we have decided to present the preliminary results of the hybrid model in the appendix part of our dissertation. Finally, in chapter 6, we conclude this dissertation
Two-dimensional based recognition approaches
the State of the Art 2.1 Introduction Before presenting our model, namely the NSHPZ -HMM (see chapter 3), we are aware that it is necessary to clearly show the value of our contribution regarding other existing works. For this reason, we present in this chapter a methodology overview of 2-D recognition engines. To the best of our knowledge, this overview presents the first complete survey of 2-D recognition approaches. Our goal is to clearly show the value and the novelty of our research work by defining the drawbacks of the existing methodologies and discussing how the proposed method solves the issue.
Classification of 2-D based recognition approaches
Our own interest is the 2-D HMM models; but in order to give a complete review, we also discuss non-HMM based 2-D recognizers. Thus, the first level of the proposed classification is HMM-based and non-HMM-based 2-D recognizers. The taxonomy of 2-D based recognition approaches is shown in Fig. 2.1. In order to present our contribution methodically, we will first start with the non-HMM based 2-D recognition approaches.
2-D BASED RECOGNITION APPROACHES:THE STATE OF THE ART
Non-HMM based 2-D recognition approaches
In [Uchida and Sakoe, 1999] [Uchida and Sakoe, 2005] [Ronee et al., 2001], a pixelto-pixel mapping technique using dynamic programming based Piecewise Linear two-Dimensional Warping (PL2DW) is presented. In PL2DW, the mapping of each column of one image into another image is given by the linear interpolation of the mapping of some specific points, called pivots, on that column. Thus, the mapping is controlled by K pivot-points with K less than or equal to the image size. The computational complexity of this technique is exponential order of the number of pivots K and polynomial order of the image size. The PL2DW was tested on English handwritten character recognition. In order to keep computations tractable, small K is used with additional constraints for individual categories of similar characters like ‘H’ and ‘M’. In [Chevalier et al., 2003], the authors proposed a 2-D approach for handwriting recognition based on Markov Random Fields models and 2-D dynamic programming (DP). The 2-D DP is used to compute the optimal configuration of an MRF model. However, to keep the model computationally tractable, the authors used a pruning strategy in order to retain, at each step of the computation, only the most probable configuration. Tested on the MNIST database, the error rate of the proposed system was 5.4%. An extension of this technique to handwritten words recognition was presented in [Chevalier et al., 2005]. A fuzzy approach to 2-D shape recognition was introduced in [Lazzerini and Marcelloni, 2001]. This approach is based on the fuzzy description of shapes in 2-D space. The horizontal and vertical coordinates of shape points and the triangular membership functions in horizontal and vertical spaces are the two basic elements for modeling shape instances. The work was tested on two applications: recognition of olfactory signals and recognition of Latin handwritten characters, with recognition rate equals to 87% and 75.86%, respectively. Graves et al. [Graves et al., 2007] introduced Multi-Dimensional Recurrent Neural Networks (MDRNN) as an efficient extension of the Recurrent Neural Network (RNN) to multi-dimensional data. In MDRNN, the single recurrent connection found in standard RNNs is replaced by as many recurrent connections as there are dimensions in the data. The overall complexity of MDRNN training is linear in the number of data points and the number of network weights. In this work, Figure 2.1: Two-dimensional based recognition approaches: a proposed classification and important advancements the proposed MDRNN is, specifically, Multi-Dimensional Long Short-Term Memory (MDLSTM). Applied to the image segmentation problem (recognition of the MNIST digit is regarded as a segmentation task), the MDLSTM outperforms the Convolution Neural Network (CNN); its recognition error rate was 0.9%. However, the CNN outperformed the MDRNN in terms of training time. On the MNIST database, the MDRNN training time was over two weeks. In [Graves and Schmidhuber, 2009], the MDLSTM was combined with Connectionist Temporal Classification (CTC) and a hierarchical layer structure to create an efficient 2-D recognizer. More details can be found in [Graves, 2008]. The work in [Graves and Schmidhuber, 2009] and [Graves, 2008] was applied to Arabic script; a sort discription of this work with their obtained results will be presented in Section 4.3.1. Brand et al. [Brand et al., 1997] proposed an algorithm for coupling and training HMMs. The proposed technique consists of introducing ”tables conditional probabilities” between the state variables of the two coupled HMMs; this is done by taking the Cartesian product of their states and transition parameters. The proposed coupled HMM (CHMM) was used to classify two-hand gestures from Chinese martial art and meditative exercise. On this action recognition task, the CHMM outperforms the 1D HMM and linked HMMs and offers superior training speeds, model likelihoods, and robustness to initial conditions. In [Likforman-Sulem and Sigelle, 2007] and [Likforman-Sulem and Sigelle, 2008], the authors tried to capture the 2-D nature of character images by coupling 1-D separate HMMs. These latter are a vertical HMM and horizontal HMM whose observable outputs are the image columns and image rows, respectively. The coupling method was done according to Dynamic Bayesian Networks (DBNs) formalism. Two coupled architectures were proposed: state-coupled between the vertical and the horizontal HMMs and the Auto-Regressive (AR) model, which coupled vertical and horizontal AR models. Size normalization of images is mandatory in this system. Experimented tests conducted on the MNIST database show that the AR-coupled models perform better than independent ones and outperform the SVM classifier on degraded characters. Even though the HMM is used in the two last mentioned systems [Brand et al., 1997] and [Likforman-Sulem and Sigelle, 2008] , we decided to cite these works in the nonHMM-based category because the 2-D property in these models is not modeled at either the A-level or the B-level.
HMM-based 2-D recognition approaches
We point out at the beginning of this section that our key vision in reviewing the proposed works for HMM-based 2-D recognizers is formulated as follows: extending 1-D HMM to truly 2-D HMM requires an efficient extension of the two algorithms, namely, the Viterbi and the Baum-Welch algorithms to the 2-D case. In other words, the optimality issue of the training and decoding algorithms in the 2-D case is our main objective. According to this vision, we evaluate older and more recent published literature in HMM-based 2-D recognition approaches.
Pseudo 2-D HMM
Talking about the research effort in 2-D HMM traditionally requires starting with the Pseudo 2-D HMM (PHMM), also sometimes referred to as Planar or embedded HMM [Agazzi et al., 1993]. This model presents the first attempt to extend 1-D HMM to 2-D. Theoretically speaking, PHMM is treated as nested one-dimensional models, rather than being truly two dimensional. However, the advantage of PHMM compared to HMM lies in its nice elastic matching property in both the horizontal and vertical directions, which makes this recognizer less sensitive to size normalization and slant correction. In practice, such as keyword spotting in ’printed’ documents [Kuo and Agazzi, 1994] and ’printed’ word/character recognition [Agazzi et al., 1993], PHMM performs much better than HMM. However, the drawback of PHMM lies in the hypothesis of lines (or columns) independency which constitutes the main limitation of this model for non-regular images such as ’handwritten’ text recognition. To overcome this drawback, Gilloux [Gilloux, 1995] proposed a combination of PHMM with Markov meshes for handwritten character recognition. Gilloux’s model is based on the assumption that the assignment of hidden states to image pixels is properly performed by PHMM, which cannot be guaranteed. Markov meshes are then used to estimate the generation probability of an image and its associated states. In [Li et al., 2017], the likelihood probability of twofold HMM was used as the health index for the rolling element bearings. The proposed model is composed of a supper 1-D HMM with a simple 1-D HMM embedded. The simple HMM was used to deal with the internal data property among the multiple features and the supper HMM was used to deal with the integral property of the features.
Truly 2-D HMM
Contrary to PHMM, causal Markov random fields (MRFs) are actually 2-D statistical models. Two types of causal MRFs have been extensively used in image processing: the Markov mesh random fields (MMRFs) (see Definition 2.1 ) and the unilateral Markov random fields which is also called non-symmetric half-plane (NSHP) Markov chain (see Definition 2.2 ). Jeng and Woods make in [Jeng and Woods, 1987] a comparative study between both models: MMRFs and NSHP Markov model differ in their choice of past and local state (see Fig. 2.2), they are equivalent when each has a quarter-plane local state. However, the MMRFs are conditionally independent on 40◦ diagonal which reduces their capacity to capture strokes having these orientations. The authors concluded that the NSHP Markov chain is the better model to use when an accurate model of the spatial data is required. In image recognition, truly 2-D HMM can be built by integrating causal MRFs at two distinguish modeling outlooks: – a MMRF characterizing the 2-D state- 28CHAPTER 2. 2-D BASED RECOGNITION APPROACHES:THE STATE OF THE ART transition probability distribution (MRF for 2-D modeling at the A-level), and – a NSHP Markov random field realization (pattern image) that is considered as an observation sequence of columns (NSHP for 2-D modeling at the B-level).
1 Introduction |