Décompositions en ondelettes redondantes pour le codage par descriptions multiples des images fixes

Facebook Tweet Pin Email

The increasing usage of the Internet and other best-effort networks for diverse multimedia communications, brings with it a stringent need for reliable transmission. For a long time, the research efforts have been concentrated on enhancing the existing error correction techniques, but during the last decades an alternative solution has emerged and gained more and more popularity. This solution mainly answers to the situation in which immediate data retransmission is either impossible (network congestion or broadcast applications) or undesirable (e.g. in conversational applications with very low delay requirements). We are referring to a specific joint source-channel coding technique known as Multiple Description Coding (MDC).

Multiple description coding builds several correlated but independently decodable (preferably with equivalent quality) bitstreams, called descriptions, that are to be sent over as many independent channels. In an initial scenario these channels are working in a binary manner, in other words, if an error occurs on one channel this is considered entirely damaged and the conveyed bitstream is unusable at the so called side decoder end. As in other robust coding methods, some amount of redundancy has to be added to the source signal, such that an acceptable reconstruction can be achieved from any of the bitstreams. Then, similar to layered coding, the reconstruction quality will be enhanced with every bitstream received, maximal reconstruction quality being attained at the so-called central decoder. The major difference with layered coding is that all “layers” have equal importance in MDC.

This thesis focuses on new approaches to Multiple Description Coding in low redundancy scenarios. We will present their application to the transmission of still images and video sequences. To this end, we have proposed new schemes based on wavelet frame decompositions, which, for computational convenience, are implemented in a lifting form.

We first study new methods of building two descriptions in the temporal axis of a ”t+2D” video codec. The redundancy of the schemes is inherent to the wavelet frame transform which is equivalent to an oversampled filter bank. However, keeping the whole set given by this decomposition would yield a redundancy of a factor of 2 which could be highly inefficient if both paths were error-less. In our schemes we perform an additional subsampling of the detail subbands while keeping the obtained approximation subbands entirely. Thus the redundancy is tuned to the size of an approximation subband in a classical wavelet decomposition. However this raises a new problem which is the perfect reconstruction of such a scheme. In this part we have proven the perfect reconstruction for certain schemes and we have established choice criteria among them based on the minimization of the quantization noise. We have compared the performances of several schemes among the efficient ones in a scalable video coding context provided by the MC-EZBC (Motion Compensated – Embedded Zero-trees Block Coding) codec. Two scenarios – losing a whole descriptions versus losing only packets in each description – have been implemented and the results have been compared to the classical critically sampled decomposition.

A second direction that we have explored in this thesis refers to the MDC of still images, which is viewed as an extension of the temporal schemes developed earlier. The problem of structure invertibility is not trivial in the two-dimensional schemes and an exhaustive study has been conducted in order to select the efficient schemes among all possible combinations based on the proposed subsamplings. Moreover, we have explored the possibility of improving the decoding by a post-processing based on a priori information on the system. This information is given by the quantization steps which can be viewed as convex constraints. The reconstruction problem has thus been formulated as the optimization of a quadratic function under convex constraints and the decoded image gains several dB in terms of Peak Signal to Noise Ratio (PSNR) both when a whole description is lost and when random pixels in each description are destroyed.

There are several strategies available in order to tackle this problem. One is the retransmission of corrupted packets, but this introduces delays which are not always an option. Another one is sending a larger bitstream which contains in addition some error correction code. This is limited to only a few wrong bits in each packet. Then there is the option of the so-called layered coding. This strategy forms several bitstreams which are progressively refinable starting from a base layer. The base layer, however, is essential to data reconstruction with a minimal quality. This means that if the base layer gets corrupted we are back to the initial problem. Then some protection techniques for this base layer have been employed.

Multiple description coding builds several correlated but independently decodable (preferably with equivalent quality) bitstreams, called descriptions, that are to be sent over as many independent channels. In an initial scenario these channels are working in a binary manner, in other words, if an error occurs on one channel this is considered entirely damaged and the conveyed bitstream is unusable at the so-called side decoder end. As in other robust coding methods, some amount of redundancy has to be added to the source signal, such that an acceptable reconstruction can be achieved from any of the bitstreams. Then, similar to layered coding, the reconstruction quality will be enhanced with every bitstream received, maximal reconstruction quality being attained at the so-called central decoder. The major difference with layered coding is that all “layers” have equal importance in MDC .

An ingredient enabling the success of an MDC technique is the path diversity since its usage balances the network load and reduces the congestion probability. In wireless networks, for instance, a mobile receptor can benefit from multiple descriptions if these arrive independently, for example on two neighbour access points; when moving between these access points it might capture one or the other, and in some cases both. Another way to take advantage of MDC in a wireless environment is by splitting in frequency the transmission of the two descriptions: for example, a laptop may be equipped with two wireless cards (e.g., 802.11a and g), each wireless card receiving a different description. Depending on the dynamic changes in the number of clients in each network, one of them may become overloaded and the corresponding description may not be transmitted.

In wired networks, the different descriptions can be routed to a receiver through different paths by incorporating this information into the packet header. In this situation the initial scenario of binary working channels might no longer be of interest, since for a typical CIF format video sequence one frame might be encoded into several packets. Therefore, the system should be designed to take into consideration individual or bursty packet losses rather than a whole description.

Table des matières

Introduction
1 Background on Multiple Description Coding
1.1 Information Theory framework
1.2 MD by quantization
1.2.1 Scalar quantization
1.2.2 Vector quantization
1.3 MD by correlating transforms
1.3.1 Statistical correlation
1.3.2 Frame expansions
1.3.3 Filter banks
1.4 Channel oriented methods
1.5 MDC in the world of multimedia applications
1.5.1 Image coding
1.5.2 Video coding
1.6 Conclusion
2 Temporal MDC schemes
2.1 Preliminaries
2.1.1 Signal Analysis
2.1.2 Decomposition Schemes
2.2 Wavelet Frame Considerations
2.3 Filter Bank Representations for Discrete Frames
2.3.1 Expression of the polyphase transfer function matrix for the MD schemes
2.4 Invertibility Using the Polyphase Transfer Matrix
2.4.1 Solution of the System Inversion
2.4.2 Optimality criteria for system inverse
2.4.3 Practical examples
2.4.4 Observations on the proposed MDC schemes
2.5 Lifting-based design of the Haar MD encoder
2.5.1 2-band lifting approach
2.5.2 Equivalent 4-band lifting implementation for the Haar filter bank
2.6 Encoder design for biorthogonal 5/3 filter banks
2.7 Decoder design
2.7.1 Haar decoders
2.7.2 Biorthogonal 5/3 decoders
2.8 Application to robust video coding
2.8.1 Temporal video descriptions
2.8.2 Central and side video decoders
2.8.3 Simulation experiments and results
2.9 Conclusions
3 Spatial MDC schemes
3.1 Multiple spatial representations in the wavelet domain
3.1.1 Forming low redundancy descriptions
3.1.2 Perfect reconstruction issues
3.1.3 Transform implementation considerations
3.2 Optimized MD reconstruction
3.2.1 A brief overview on convex set theoretic estimation
3.2.2 Optimized decoding problem formulation
3.2.3 Iterative projections algorithm
3.2.4 Reference image in the objective function
3.3 Case study
3.3.1 On/off channels – loss of an entire description
3.3.2 Random losses in each description
3.4 Further extensions of the MDC scheme
3.5 Simulation results
3.5.1 Choice between the schemes
3.5.2 Scheme performance versus a critically sampled decomposition
3.5.3 Random losses scenario
3.6 Conclusions
4 A complementary approach exploiting sparsity
4.1 Analysis vs. synthesis frames
4.2 Rate-distortion problem for the synthesis frame approach
4.3 Convex optimization
4.3.1 Proof of theoretical results
4.4 Example and numerical results
4.5 Synthesis frame approach versus the classical MDC approach
4.6 Conclusions
5 Conclusion