Computer Vision Framework

Facebook Tweet Pin Email

Computer vision, as defined by Gonzalez et al. (2004), is a branch of artificial intelligence « whose ultimate goal is to use computers to emulate human vision, including learning and being able to make inferences and take actions based on visual inputs. » This includes the problem of texture recognition in natural scenes. A typical computer vision classification system can be broken down into a pipeline of successive algorithms each dealing with one specific aspect of the problem. Because every problem is unique, the structure of the system may change significantly from one problem to another.

separates computer vision processing steps into two large groups of subproblems. The first one is associated with the field of image processing and deals with images and pixels representing the real world as seen by humans. At this stage, operations are performed to acquire, enhance, restore, segment and describe the image. The output is an abstract quantitative, yet highly descriptive representation of the observed object. The second group is associated with the field of pattern recognition, and deals with these much more abstract entities. The steps of dimensionality reduction, classification and rejection aim to output a correct decision given an input signal describing an observation. The following list presents a short definition of each one of these seven steps and introduces their roles in the context of benthic image annotation:

a. Image acquisition aims to convert energy waves (or particles) into a digital image. While visible light is a popular option, this definition can extend to any wavelength, and even to non electromagnetic sources, like ultrasounds, or beams of accelerated electrons. As it is true for all steps of this pipeline, high quality processing using appropriate methods and parameters is important to support the following processing steps. Bernd (2002); Gonzalez et al. (2004) explain some of these parameters: selection of an appropriate light source, sensor technology, wavelength, lens, illumination strategy, etc. While several studies have demonstrated the performance of various acquisition system specifically designed for benthic image annotation, the problem being studied does not allow any control over these parameters, as image acquisition has already been performed. It is therefore necessary to deal with the various challenges that come with the benthic image datasets through other means.

b. Preprocessing aims to restore the image by correcting acquisition artifacts such as noise, or enhance the image for further processing. This can include a wide array of image processing methods, such as intensity transformation, spatial or frequency filtering, geometric transformations, multi-resolution decomposition, lens distortion correction, etc. For simplicity, we’ll refer to these as image filtering methods. Depending on the dataset, benthic images have specific acquisition flaws that can be addressed at the preprocessing level.

c. Segmentation is the process of finding and isolating one or more regions of interest (ROI) in the image. This step allow the system to focus on important objects without considering the irrelevant background information. Various segmentation methods exist depending on the complexity of the problem. For difficult problems, the segmentation step may be an entire computer vision processing pipeline including some recognition steps, like it is the case for face detection in biometric facial identification systems.

d. Representation & description is often referred to as descriptor extraction or feature extraction. This process takes a set of pixels representing an object of interest and attempts to extract a set of meaningful measures in the form of a feature vector that will allow mathematical models to manipulate the data and find patterns in the following steps. Because ROIs are defined by a set of many pixels, the colossal amount of information they contain usually makes them impractical to manipulate directly, as the smaller bits of meaningful information are diluted and hidden. While humans excel at inferring these patterns from few examples, it is very hard for a machine to find them, hence the necessity of feature extraction. Some modern methods like deep convolution neural networks, or sparse coding can be used to find these relevant patterns inside complex data, but these methods are considered beyond the scope of this work .

e. Dimensionality reduction is an optional step that takes a large feature vector, and increases its level of abstraction by further reducing its information. Because the previous step of representation & description has a similar goal, both steps are sometimes considered to be the same. However, because they are not mutually exclusive, we consider it to be a separate, but optional step. In some problems where there is a strong spatialintensity relationship between pixels, like in object recognition, it is possible to use simple dimensionality reduction methods directly on the region of interest to extract meaningful features. Because it is not the case for texture recognition in natural scenes, dimensionality reduction is considered a separate step.

f. Classification can be defined as the statistical inference of the class associated with a given observation (or instance). Classification uses a mathematical or heuristic model previously trained on many labeled examples of the expected output given a specific input. A large variety of classification schemes exist, each having their own capabilities and limitations. But as stated by Wolpert and Macready (1997) in their famous No Free Lunch Theorem, « […] for any [classification] algorithm, any elevated performance over one class of problems is offset by performance over another class », as different classes of problems have mutually exclusive properties. We focus our study on classification algorithms that have performed well in texture recognition problems as well as betnhic image annotation.

g. Rejection is another optional step that can further improve the reliability of the system. Some classifiers can be trained to output a score, or a certainty metric along with their class prediction. This score can be used to threshold acceptance of the prediction. Given a low enough score, the system can decide to ignore a particular sample reducing the misclassification frequency .

Table des matières

INTRODUCTION
CHAPTER 1 LITERATURE REVIEW
1.1 Computer Vision Framework
1.2 Preprocessing
1.3 Segmentation
1.4 Representation & Description
1.4.1 Local Binary Patterns
1.4.2 Grey Level Cooccurrence matrix
1.4.3 Gabor Filter Response
1.4.4 Other Global Descriptors
1.4.5 Textons
1.4.6 Describable Texture Dataset SVM Scores
1.4.7 Deep Convolutional Activation Feature
1.5 Dimensionality Reduction
1.6 Classification
1.6.1 Multi-class SVMs
1.6.2 Kernel Trick
1.6.3 Model selection
1.6.4 Multiple classifier fusion
1.7 Rejection
CHAPTER 2 DATA
2.1 Challenges
2.2 AIMS Dataset
2.3 MLC Dataset
2.4 Other Datasets
2.4.1 Texture Dataset (Lazebnik)
2.4.2 Columbia-utrecht reflectance and texture database
CHAPTER 3 METHOD
3.1 Preprocessing
3.1.1 Chromatic aberration
3.1.2 Channel information lost
3.2 Feature Extraction
3.2.1 Proposed global feature vector
3.2.1.1 Intensity histogram features
3.2.1.2 Grey Level Cooccurrence matrix
3.2.1.3 Completed local binary patterns
3.2.1.4 Gabor filtering
3.2.1.5 Hue Histogram and opponent angle
3.2.1.6 Additional color information
3.2.1.7 Comparison to previous work
3.2.2 Combining features
3.2.2.1 Normalization
3.3 Classification, Fusion and Rejection
3.3.1 SVM Training and Testing Methodology
3.3.2 Multiple Classifier Fusion
3.3.3 Rejection
CHAPTER 4 ANALYSIS AND RESULTS
4.1 Features Comparison
4.1.1 Popular Texture Benchmarks
4.1.2 Coral Datasets
4.1.3 Proposed Global Feature Set
4.2 Rejection
CHAPTER 5 DISCUSSION AND CONCLUSION