Encoder overview
Similar to H.264/MPEG-4 AVC, HEVC adopted a hybrid video coding architecture with several enhancements in each part (Katsigiannis et al., 2013). The HEVC coding process is based on removing redundant information in several stages. Each picture is partitioned into rectangular block-shaped regions and the pixels of this block are coded by prediction mode and residual signal. There are two possible prediction modes in HEVC, intra-prediction and interprediction. Intra-prediction uses spatial information within the same frame to build the current block content. On the other hand, inter-prediction will find the best match for the current block in previously coded frames. Regardless of the prediction type, the difference between the original block and its predicted block will be transformed by a linear spatial transform. Subsequently, the resulting transform coefficients will be scaled and quantized. Finally, this residual data plus prediction information are coded by the entropy coder to form the encoded bit stream.
However, a real encoder architecture is much more complex. The encoder also performs the decoding process to generate the exact same results as on the decoder side. Inverse scaling and inverse transform of the encoded data will reproduce the image and then the residual signal can be calculated. Furthermore, a loop filter is used to smooth out the artifacts induced by the block-wise processing of HEVC. The reconstructed picture is stored in a buffer in order to be used for subsequent predictions. This architecture prevents any drifting or aggregation error in the coding progress because the reconstructed picture is identical to the picture on the decoder side.
Furthermore, there are several decisions in each step that have to be made by the encoder in order to provide best rate-distortion performance. For instance, split blocks and prediction parameters have a great influence on rate-distortion. This process is performed by an exhaustive execution of RDO by evaluating all possible choices and selecting the best as the final choice.
Frame partitioning
The high-level segmentation of a picture in HEVC is achieved based on the slice concept. Using the slices, the frame can be partitioned in such a way that each slice is independently decodable from other slices (Sullivan et al., 2012). A slice may consist of a complete picture or parts of it. Each slice contains an integer number of consecutive coding tree units (CTUs). The main advantage of slices in HEVC can be mentioned as:
1. Error Robustness: Partitioning the picture into smaller independent parts allows to gain error robustness. Therefore, in case of data losses, the decoder is able to discard the erroneous stream parts and start decoding from a correct block. Furthermore, the slices are sent in separate network packets, thus, the loss of a transport packet results in a loss of only one slice.
2. Parallel Processing: To partition the picture into units that can be processed in parallel since there is no inter-dependencies between slices.
Coding units (CUs) in a slice are processed in raster scan order such that each slice of a picture is independently decodable. This is achieved by terminating the context-adaptive binary arithmetic coding (CABAC) bitstream at the end of each slice and by breaking CTU dependencies across slice boundaries. This prevents the encoder from using the spatial information outside the slice boundaries. Thus, the coding efficiency usually decreases quite substantially when increasing the number of slices used for a picture.
Additionally, each slice can be coded using different coding types among three slice types. The first type is I slice in which all CUs of the slice are coded using only intra-prediction mode. The second type is P slice. Inside a P slice, in addition to the coding types of I slice, some CUs can be coded using inter-prediction with one motion-compensated prediction signal per prediction block (PB). The last slice type is called B slice that is similar to P slices with two predictions per PB.
The tile is a partitioning mechanism similar to slices, which is based on a flexible subdivision of the picture into rectangular regions of CTU. In addition, coding dependencies between CTUs of different tiles are prohibited. Contrary to slices tiles provide better support for parallel processing rather than error resilience (Zhou et al., 2012). Although, non-uniform tiles are allowed in HEVC, typically each tile consists of an approximately equal numbers of CTUs (Misra et al., 2013).
Furthermore, using slices and tiles simultaneously is permitted but tiles must include complete slices or slices must include complete tiles.
Block partitioning
In contrast to the fixed Macroblock (MB) size (16×16) in H.264/AVC, HEVC uses a more adaptive quadtree structure called CTU. The quadtree structure consists of blocks and units with a maximum size of 64×64. A block includes a rectangular area of picture samples and a unit is formed by a luma block and two chroma blocks with related syntax information. For instance, a CTU is formed by a luma coding tree block (CTB) and two chroma CTBs with syntax determining further subdivisions. As a result of subdivisions, new units called CUs are generated. The encoder can decide whether a CTB divides into one CU or more.
The CU can be divided into prediction units (PUs) and transform units (TUs) that perform prediction and transformation respectively. CUs, PUs and TUs consist of associated luma and chroma blocks called coding blocks (CBs), PBs and transform blocks (TBs) respectively
It is clear that HEVC splitting is more adaptive relative to the approach used in H.264/AVC and is notably useful for high resolution videos.The quadtree that is used for dividing a CB into TBs has its root at the CB level and is called residual quad tree (RQT) since it is built over the residual data.
INTRODUCTION |