Keywords: SVC, WLAN, Video streaming, Background traffic, Objective video quality
With the increasing proliferation of multimedia content over the Internet and the emergence of handheld mobile devices like tablets, smartphones and laptops capable of streaming video content, wireless video communication has become attractive more than ever before, receiving significant attention from both the industry and academia. Wireless video transmission applications are easily deployed in homes, offices and transport vehicles.
Wireless Local Area Networks (WLANs) technologies support applications such as video streaming, VoIP and many others, especially due to mobility, good throughput, and low budget requirements. Currently, there are many available WLANs, including IEEE 802.11a, IEEE 802.11b, and IEEE 802.11g, etc. The IEEE 802.11 a/b/g standards support contention-based communication mechanism of Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA). Although this mechanism has become very common, they are considered inefficient for achieving a reasonable video quality in scenarios with high background traffic, because they provide best-effort services which restrict QoS for high critical multimedia applications. Wireless video communications face a lot of challenges. Delivery of real-time video over wireless networks imposes stringent requirements, especially in terms of bandwidth, delay constraints, latency and loss variations. Like other wireless technologies, channel impairments can affect the IEEE 802.11 physical transmission rate assigned to mobile users. The actual throughput achieved by a specific user can also vary, depending on the number of users and nature of applications sharing the same channel.
The Scalable Video Coding extension of H.264/MPEG-4 AVC (Advanced Video Coding) facilitates efficient video transmissions, especially over wireless networks, allowing the encoding of a video sequence and streaming of same over heterogeneous networks to a variety of end devices. With H.264/SVC, different scalability techniques can be used in order to deliver the most appropriate video bitstream based on network characteristics and mobile device capabilities.
Multitude of studies [1-2] have been carried out on video transmission over loss-prone wireless channel networks. Authors in [3-4] have carried out research on SVC streaming over IEEE 802.11 networks. In this paper, H.264/SVC video quality transmission over IEEE 802.11 networks in the presence of background traffic is studied. In particular, we consider a scenario where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and background traffic to one client and two sets of background traffic to another client. We objectively evaluate the quality of the streamed video given background traffic with varying bit rates, contents with different spatio-temporal information encoded at different quantization parameter levels. All packets were given equal priority. Results indicate that received video quality deteriorates with increasing background traffic and high content bit rate, given no packet differentiation at the MAC (Media Access Control) layer level. Also, contents may be affected differently, depending on the scene complexity and coding efficiency.
The latest H.264/MPEG-4 AVC standard provides a scalable extension, called H.264/SVC , making it the first standard that defines international multi-dimensional scalability. H.264/SVC achieves significant compression efficiency and reduction in processing complexity, as well as very good subjective quality ratings . H.264/SVC scheme is known to be very valuable in video applications over the Internet and wireless video transmission, low resolution video applications, multicast applications, range of quality suited for different heterogeneous receiver capabilities, and resilience in bandwidth variation scenarios . The bit rate adaptability capability which is native to the scalable codec design provides content adaptations, based on changes to network conditions. H.264 scalable video coding reuses the key features of H.264/MPEG-4 Advanced Video Coding and also employs other techniques to provide scalability extensions and to improve coding gain.
Fig. 1: Diagrammatic representation of SVC scalabilities
In general SVC can provide three types of scalability, namely temporal, spatial and SNR dimensions, allowing multiple video representations, by leaving out parts of the encoded representations, thereby adapting bit rate and quality levels during video transmission. Scalable bit-stream is organized into a base layer and one or several enhancement layers. The base layer is considered more important than the enhancement layers. While the base layer needs less transmission bandwidth due to its coarser quality, the enhancement layer requires more transmission bandwidth due to its finer quality. Consequently, SNR/spatial/temporal scalability achieves bandwidth scalability. Fig. 1 above shows a diagrammatic representation of SVC scalabilities.
• Spatial scalability refers to the possibility of representing the same video in different spatial resolutions or sizes (e.g. QCIF, CIF and 4CIF). Generally, spatially scalable video is encoded by using spatially up-sampled pictures from a lower layer as a prediction in a higher layer. Inter-layer prediction techniques are used to further improve the coding efficiency.
• Temporal scalability refers to the possibility of representing the same video in different temporal resolutions or frame rates, i.e. the number of frames contained in one second of the video, allowing video to be played at different frame rates. It is typically implemented by making use of temporally up-sampled pictures from a lower layer as a prediction in a higher layer.
• Quality scalability, also called signal-to-noise ratio (SNR) scalability, refers to the possibility of representing the same video in different perceptual quality levels. SNR-scalable coding quantizes the DCT coefficients to different levels of accuracy by using different quantization parameters.
Scalable Video Coding, deriving its extension from H.264/AVC, maintains the concepts of Video Coding Layer (VCL) and Network Abstraction Layer (NAL). While the VCL acts as the interface between the encoder and video frames, employing block-based structure and supporting different scalabilities, the NAL acts as the interface between the encoder and actual network protocol, enabling the formatting of the coded videos for transmission over the packet networks, providing necessary header information. A NAL unit consists of a header and a payload, carrying the actual encoded video frame and its relevance in the decoding process . The NALU header defines different parameters, including the dependency id (DID), describing the spatial scalability; the temporal id (TID), indicating the temporal scalability hierarchically; the quality id (QID), which is used to define the quality scalability structure; and the priority id (PID), which assigns priority to the stream. For more details, please consult 
In this section, we describe the implementation steps, starting with video sequence encoding, simulation methodology and objective video quality evaluation. We consider a scenario where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and background traffic to one client and two sets of background traffic to another client. We objectively evaluate the quality of the streamed video given background traffic with varying bit rates, contents with different spatio-temporal information encoded at different quantization parameter levels.
Three sequences, each of 10 seconds duration, with different genres and characteristics covering varying spatial and temporal complexity, namely, Foreman, News and Coastguard were selected .
Fig. 2: Snap shots of the video sequences
The diagram above shows the frames of the three sequences: Foreman, News and Coastguard, in that order.
Fig. 3: Spatial and temporal indicators of the three contents
Fig. 3 above shows the spatial (SI) and temporal Information (TI) indices on the luminance component of the contents, respectively: Foreman: 59.38, 20.57; News: 75.41, 23.52 and Coastguard: 76.43, 23.50. Spatial perceptual Information (SI) and Temporal Perceptual Information (TI) based on Sobel filter from ITU-T-Rec P.910  was used in order to measure the complexity of the scene given in Eqs. (1) and (2)
Where Fn represents the luminance plane in a video frame at time n. It is observed that Foreman has smaller SI and TI values, compared to News and Coastguard. Detailed information regarding the three sequences and encoder configuration is summarized in Table 1. The video sequences were sourced from different publicly available video traffic traces, including .
Fig. 4: Implementation methodology
|Input YUV files||Foreman, News, Coastguard|
|Frame Rate||30 fps|
|Number of frames||300|
|Number of layers||1|
|Base layer mode||0|
|Encode key pictures||1|
The implementation methodology is shown in Fig. 4 above. The three YUV video files were first encoded using the JSVM Software Manual , according to the configurations further summarized in Table 1. A set of different QP scenarios was designed to cover a wide range of quality levels. We encoded each video using 7 scenarios in which the QP values for the base layer are varied for 44, 38, 32, 26, 20, 15 and 10. The coding efficiency of H.264/SVC is dependent on the quantization parameters of each layer. Packet traces (Network Abstraction Layer Units) of the H.264 bit streams are generated using BitStreamExtractor.
Fig. 5 Simulation topology
|Antenna model||Antenna/Omni Antenna|
|Data rate||11 Mbit/s|
|Basic rate||1 Mbit/s|
|Number of mobile modes||3|
The NALUs are prepared for transmission over the IP network (hinting, packetization). The resulting H.264 video trace files are hinted using MP4Box  which emulates the streaming of the *.h264 video over the network based on RTP/UDP/IP protocol stack. Large NALUs are thus split through IP layer fragmentation. Real-time Transport Protocol (RTP) is used for transfer of real-time data like video streaming. Existing transport protocols like UDP (User Datagram Protocol) will run under RTP. RTP provides applications that occur in real-time with end-to-end delivery services, such as sequence numbers, types, sizes of the video frames and the number of UDP packets used to transmit each frame, and timestamps (for packet loss and reordering detection, and end-to-end delay).
We conduct the simulations of H.264/SVC video transmission over IEEE 802.11  using NS-2 . The wireless channel configuration is summarized in Table 2. The simulated scenario consists of three wireless nodes, one multimedia server and two clients, all within reasonable transmission range. The multimedia server transmits H.264/SVC video and CBR traffic to Client 1, while Client 2 receives FTP and CBR traffic from the server, all happening simultaneously. Packet sizes were set to 1500 Bytes. The network topology is depicted in Fig. 5. The background traffic generated at the server and accessed by the two clients, while streaming video traffic, increases the virtual collisions that occur at the server’s MAC layer. All the packets were assigned equal priority and scheduled from the same access point of the multimedia server. The experiment is designed to study the impacts of competing background traffic with different sending rates on the streamed video quality. In order to overload the wireless transmission, the CBR flows for the two clients are varied from 0.1, 0.5 to 1 Mbit/s each, while streaming the different video sequences of different contents and different encoding QP values.
10 different initial seeds for random number generation were chosen for simulation. Results generated were averaged over these 10 runs. After simulation, the received trace file is generated. The received and the original NALU trace files are further combined and processed to generate the received NALU trace. Maximum playout buffer delay at the video client is set to 5 seconds. After further processing, the received NALU trace is passed through BitStreamExtractor which generates H.264 video, which is in turn decoded with the JSVM H264Decoder, thus obtaining an uncompressed YUV file. The reconstructed YUV file and the original one are compared with objective video quality metric, to compute the overall video quality.
Objective video quality algorithms are based on mathematical models that can predict image multimedia quality by comparing a distorted signal against a reference, typically by modeling the human visual system. Some existing objective criteria are Mean Error Square (MSE), Peak Signal-to-Noise Ratio (PSNR), SSIM (structural similarity) and VIF (Visual Information Fidelity). In this experiment, PSNR  is adopted as our objective metric. PSNR has been selected because it is the most widely used metric.
PSNR can be computed for both luminance (Y-PSNR) and chrominance (U-PSNR and V-PSNR) components of the video. The human eye is considered more sensitive to luminance (brightness) than chrominance (colour), therfore the PSNR is usually evaluted only for the luminance (Y) component. The equation below shows the relationship between the PSNR of the luminance component Y of original image and degraded image D:
Where Vpeak = 2k-1; k denotes number of bits per pixel. Ncol represents the number of columns; Nrow the number of rows in an image. PSNR computes the error between a reconstructed image and the original one. A larger PSNR value denotes better image quality.
Fig. 6 – Fig. 14 depict the results obtained from this experiment. Fig. 6 depicts the quality comparison of the encoded only video sequences, for the three contents, encoded at different quantization parameter values. Results indicate that lower quantization parameters lead to better perceptual quality, depicted by higher PSNR values. Fig. 7 plots the PSNR curve vs. frame number for the Foreman sequence, encoded only at QP = 44 and 10, and Foreman encoded at QP = 10 and transmitted under 1 Mbit/s background traffic level.
Fig. 6: Impact of quantization parameter on video quality
Fig. 7: Quality comparison for encoded only and transmitted sequences
Fig. 8: Quality comparison for transmitted sequences, QP =44
Fig. 9: Quality comparison for transmitted sequences, QP =38
Fig. 10: Quality comparison for transmitted sequences, QP =32
Fig. 11: Quality comparison for transmitted sequences, QP =26
Fig. 12: Quality comparison for transmitted sequences, QP =20
Fig. 13: Quality comparison for transmitted sequences, QP =15
Fig. 14: Quality comparison for transmitted sequences, QP =10
Analysis of the generated bit streams (Table 3) shows that the lower the quantization parameter, the higher the generated file size and consequently higher bit rates (204 Kbit/s for Foreman at QP = 44, 6.53 Mbit/s at QP = 10; 122 Kbit/s for News at QP = 44, 2.63 Mbit/s at QP = 10; 296 Kbit/s for Coastguard at QP = 44, 8.30 Mbit/s at QP = 10), however. The QP value may however vary during the encoding process, depending on the position of each frame within the Group of Pictures.
Fig. 8 to Fig. 14 plot the PSNR values for the three video sequences encoded at seven different quantization levels and transmitted from same multimedia server accessed at varying background traffic bit rate levels. Encoded only videos generally have higher PSNR values compared to their transmitted counterparts. At higher quantization levels (QP = 44 to 32) and lower background bit rate level, the PSNR value of the streamed video sequences remain same as their coded only counterparts, meaning that no video packets were lost during transmission. However, at lower quantization levels and higher background traffic thresholds, the PSNR values of the streamed video decline sharply. Content-based analysis reveals that the video sequences can react differently to competition for channel bandwidth arising from background traffic of different bit rates. This could be attributed to different spatio-temporal complexities of the sequences. Given no packet pritotization, contents with high bit rates (e.g. Coastguard) suffer higher PSNR degradation, caused by collision- induced video packet loss at the MAC layer of the streaming server, even at same encoding quantization level.
This paper has presented a detailed video quality evaluation in the transmission of H.264/SVC video over IEEE 802.11 networks in the presence of background traffic. We considered a scenario where a wireless multimedia server is transmitting single-layer encoded H.264/SVC and background traffic to one client and two sets of background traffic to another client. We objectively evaluated the quality of the streamed video given background traffic with varying bit rates, contents with different spatio-temporal information encoded at different quantization parameter level. Results indicate that received video quality deteriorates with increasing background traffic and high content bit rate, given no packet differentiation at the MAC (Media Access Control) layer. Also, contents may be affected differently, depending on the scene complexity and coding efficiency. For future work, we intend to expand the studies to tradeoffs in video quality optimization in the presence of background traffic, which includes packet prioritization and QoS mapping, and the use of IEEE 802.11e for SVC content streaming in IEEE 802.11 networks.
This work was supported by the COST IC1003 European Network on Quality of Experience in Multimedia Systems and Services – QUALINET; by the COST CZ LD12018 Modeling and verification of methods for Quality of Experience (QoE) assessment in multimedia systems – MOVERIQ; by the grant No. P102/10/1320 Research and modeling of advanced methods of image quality evaluation of the Grant Agency of the Czech Republic; and by the project of the Student grant agency of the Czech Technical University in Prague SGS12/077/OHK3/1T/13, “Cross-Layer Quality Optimization in New Generation Heterogeneous Wireless Mobile Networks.”
 Z. He, J. Cai, C.W. Chen, “Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding,” IEEE Trans. Circuits Syst. Video Technol. 12 (6), 2002.
 C.-M. Chen, C.-W. Lin, H.-C. Wei, Y.-C. Chen, “Robust video streaming over wireless lans using multiple description transcoding and prioritized retransmission,” Visual Commun. Image Represent. 18 (3) 2007.
 C.H. Foh, Y. Zhang, Z. Ni, J. Cai, K.N. Ngan, “Optimized cross-layer design for scalable video transmission over the IEEE 802.11e networks,” IEEE Trans. Circuits Syst. Video Technol. 17 (12), 2007.
 A. Fiandrotti, D. Gallucci, E. Masala, E. Magli, “Traffic prioritization of H.264/SVC video over 802.11e ad hoc wireless networks, ” Proceedings of 17th International Conference on Computer Communications and Networks, Virgin Islands, USA, 2008.
 H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, 2007.
 J. Lee, F. De Simone, and E. Ebrahimi, "Subjective quality assessment of scalable video coding: A survey,” 2011 Third International Workshop on Quality of Multimedia Experience (QoMEX), pp.25-30, 7-9 Sept. 2011.
 T. Schierl, T. Stockhammer, and T. Wiegand, “Mobile Video Transmission Using Scalable Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1204-1217, Sept. 2007.
 S. Wenger, Y. K. Wang, T. Schierl, and A. Eleftheriadis, “RTP payload format for SVC video," Internet Engineering Task Force (IETF), September 2009.
 W. Ye-Kui, M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger, “IEEE Transactions on System and transport interface of svc,” Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1149 –1163, Sept. 2007.
 Video Trace Library, http://dbq.multimediatech.cz/ [online]
 ITU T Rec. P.910, "Subjective video quality assessment methods for multimedia applications", Geneva, Sep. 1999.
 JSVM Software Manual, http://evalsvc.googlecode.com/files/SoftwareManual.doc [online]
 MP4BOX, http://www.videohelp.com/tools/mp4box [online]
 IEEE Standard 802.11-2007, “Local and metropolitan area networks-Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications,” June 2007.
 The Network Simulator-NS2, http://www.isi.edu/nsnam/ns/ [online]
 Z. Wang, L. Lu, and A. C. Bovik, “Video quality assessment based on structural distortion measurement,” Signal Processing: Image Communication, vol. 19, no. 2, pp. 121-132, 2004.