In this section, we introduce the Optimized LM-WLCSS (OLM-WLCSS), our proposed approach for online gesture recognition. This technique is robust against noisy signals and strong variability in gesture execution as well as methods we previously described. This section first describes the quantization step, following in the training phase. Then, the recognition block for one class and the optimization process are presented. Finally, we describe the decision-making module.
QUANTIZATION
Similarly to the WLCSS, we use K-Means algorithm to cluster the Nc data of the sensor in the quantization step. Each sample from the sensor is represented as a vector (e.g. an accelerometer is represented as a 3D vector). Thus, each sensor vectors are associated with their closest cluster centroid by comparing their Euclidean distances. Since the WLCSS does store symbols (as a representation of centroids), we suggest preserving centroids instead.
TRAINING
This subsection presents the overall vision of our offline training method in one class Z. In the case of two or more classes, the process is repeated. Templates matching methods find similarities in the signal and detect gesture via a motif. The template can be elected as the best representation over the whole possible alternatives of the gesture in a training phase. Such patterns maximize the recognition performance.Raw signals are first quantized to create a transformed training set. Next, this new data set is used for electing a template. Finally, resulting motif is given, as a parameter, to the rejection threshold calculation method that output the tuple (template, threshold).
TEMPLATE ELECTION
Once the quantization phase is achieved, the next step is to elect the best template. As described in (Long-Van et al., 2012), such process is performed via the LCSS method that has been modified to handle vector instead of symbols. Each instance was defined as a temporary template and then compared to the other ones. The reference template is defined thanks to the mean resulting score.
REJECTION THESHOLD CALCULATION
The rejection threshold calculation is similar to the one presented in the LMWLCSS algorithm. The score between the template and all the gesture instances of class Z is computed with the core component of our algorithm. Then, the matching score mean µc and the standard deviation σc are calculated. The resulting threshold is determined by the following formula:
Thd = µ – h ∙ σ, h ∈ ℕ
RECOGNITION BLOCKS FOR ONE CLASS
The outcome of the previous phase is the best tuple (template, rejection threshold) for each class. These two elements define parameters that allow matching a gesture to the incoming stream. As for the training, raw signals are first quantized. The resulting sample and the previously elected template are given to the OLM-WLCSS method presented in the training phase. Next, the matching score is given to the SearchMax algorithm that sends a binary event.
SEARCHMAX
The matching score computed in previous steps should increase and exceed the threshold if a gesture is performed. However, noisy signals imply fluctuations and undesired detections. To overcome such issues, we used the SearchMax algorithm which was introduced in (Roggen et al., 2015). Its goal is to find local maxima among matching scores in sliding window Wf. SearchMax loops over the scores and compares the last and the current score to set a flag; 1 for a new local maximum (Maxsm) and 0 for a lower value. A counter (Ksm) is increased at each loop. When Ksm exceeds the size of Wf the value of Maxsm is compared to the threshold Thd. Eventually, the algorithm returns a binary result; 1 if the local maximum is above Thd to indicate that a gesture has been recognized, 0 otherwise.
QUANTIZATION AND SEARCHMAX OPTIMIZATION
The previously described quantization phase associates each new sample to the nearest centroid of the class c. Thus, each class has a parameter AY that defined the number of clusters generated in the training phase. In prior work, Long-Van et al. (2012) have defined it with a value of 20 after they ran some tests. In this way, we have also performed some tests with various cluster numbers. It appears that this parameter highly impacts the performance of the algorithm.
FINAL DECISION
Previous steps were independently performed for each gesture class. However, noise in raw signals and high variations in gesture execution can lead to multiple detections. Several methods are available to resolve conflicts, such as the weighted decision described in (Banos, Damas, Pomares, & Rojas, 2012). In our system, we choose to employ the lightweight classifier C4.5 (Quinlan, 2014), that requires a supervised training.
The training of C4.5 comes directly after the optimization step. It is performed using a 10-Fold cross-validation on a data set previously created. This file may be considered as a N ∗ M matrix, with N is the number of samples from the template training data set, and M is the amount of recognition blocks. Each element ri,j of this matrix represents the result of the j-th recognition block for the i-th sample.
CHAPTER 1 INTRODUCTION |