The impact of transmission errors on progressive 720 lines HDTV coded with H.264 Kjell Brunnström1, Daniel Stålenbring1, Martin Pettersson2, and Jörgen Gustafsson2 1 IPTV, Video and Display Quality, NetLab. Acreo AB, Kista, Sweden 2 Ericsson Research, Sweden Email: [email protected] ABSTRACT TV sent over the networks based on the Internet Protocol i.e. IPTV is moving towards high definition (HDTV). There has been quite a lot of work on how the HDTV is affected by different codecs and bitrates, but the impact of transmission errors over IP-networks have been less studied. This study was focusing on H.264 encoded 1280x720 progressive HDTV and was comparing three different concealment methods for different packet loss rates. The first was included in a propriety decoder, the second was part of FFMPEG and the third was freezing of different length. The target is to simulate what typically IPTV settop-boxes will do when encountering packet loss. Another aim was to study whether the presentation upscaled on the full HDTV screen or presented pixel mapped in a smaller area in the center of the sceen would have an effect on the quality. The results show that there were differences between the two packet loss concealment methods in FFMPEG and in the propriety codec. Freezing seemed to have similar effect as been reported before. For low rates of transmission errors the coding impairments has impact on the quality, but for higher degree of transmission errors these does not affect the quality, since they become overshadowed by transmission error. An interesting effect was found in that the higher bitrate videos goes from having higher quality for lower degree of packet loss rates, to having lower quality than the lower bitrate video at higher packet loss rates. The different way of presenting the video i.e. upscaled or not-upscaled was significant on the 95% level, but just about. It was not significant when considering DMOS i.e. the mean of the difference between the scores for the reference and the scores for the distorted sequences. Keywords: Video Quality, H.264, Transmission errors, HDTV, 720p, Error concealment 1. INTRODUCTION TV sent over networks based on the Internet Protocol i.e. IPTV is moving towards high definition (HDTV). There has been quite a lot of work on how the HDTV is affected by different codecs and bitrates, but the impact of transmission errors over IP-networks have been less studied. Studies by e.g. Haglund et al. (2002)1 have shown that 1280 by 720 progressive (720p) format is a better broadcasting format than the 1920 by 1080 interlaced (1080i), when it comes to coding efficiency i.e. comparable quality to a lower number of bits. This study has, therefore, focused on 720p. The Video Quality Experts Group (VQEG) is an independent group of experts, which brings together organizations in government laboratories, academia and the industry to evaluate video quality metrics. The outcomes of the evaluations are then given as input to standards bodies like the International Telecommunication Union (ITU). However, this study is not done for VQEG, but its purpose is partly complementary. VQEG is currently carrying out an HDTV test2, that is scheduled to be completed in the spring 2010. In this test, the format 720p i.e. 1280 by 720 progressive was not included as an independent format. Instead it was considered as a Hypothetical Reference Circuit (HRC), which means that source is 1080p (1920 by 1080 progressive). The video is down sampled to 720p and at the end upsampled to 1080p again. In between other HRCs such codecs or transmission errors might have been applied. This study was also intended to test whether this would have an effect on the quality scores collected from the viewers. When transmission errors occur for a streaming video service in an IP-network e.g. IPTV it comes in the form of packet loss. This has usually severe impact on the quality, but the end result is dependent on the concealment used by the decoder. The study was comparing three different concealment methods for different packet loss rates. The first was included in a propriety decoder, the second was part of FFMPEG3 and third was freezing of different length. The target is to simulate what typically IPTV settop-boxes might do when encountering packet loss. 2. METHOD The test was conducted mainly in accordance with the VQEG HDTV project testplan2. However, the main difference was that in this test 720p was the main target. It was displayed in two different ways; one group with no upscaling (the screen filled with gray surround) and in the other group with upscaling in the TV-set. Subjective video quality was measured using the Absolute Category Rating (ACR)4 method. The test video sequences were presented one at a time and afterward rated independently on the ACR scale, as seen in Figure 1. In the test, the ACR procedure included both the processed and the reference, i.e. unprocessed source, versions of each video sequence. The reference sequences were not identified as such to the viewers (hidden reference approach). At the data analysis the average of the scores taken over all users were calculated, these values are referred to as Mean Opinion Scores (MOS). Furthermore, the quality scores for the processed video sequences (PVS) were subtracted from the quality scores of the corresponding reference source (SRC) sequences to obtain a difference MOS referred to as DMOS. This procedure is known as “hidden reference” removal. Figure 1: The voting screen that was presented to viewers. The viewing room, located in Acreo’s multimedia lab, was prepared to conform to the specifications of ITU-T Rec. P.9104. Each viewer, one at a time, completed the test in a section of a room divided by grey drapery. Viewers were seated three screen heights (3H) from the screen, and instructed not to move their heads too much or lean forward. The viewing distance was not otherwise controlled. The ambient light was produced by high frequency D65 fluorescent tubes located in the ceiling generating a light level of 20 Lux, measured at the table. The test video sequences were presented on a high-grade consumer HDTV LCD display (Samsung LE40A796). The display was used in full resolution i.e. 1920x1080 with update frequency of 50 fps for one of the groups (Group A). The video was then positioned in the middle and the rest of the screen was filled with an even grey colour (grey value 128). For the other group the screen resolution was set to 1280x720 and 50 fps (Group B). In this case the video was upscaled by the display to cover the whole screen. This meant that the physical distance were different in the two groups 0.99 m for Group A and 1.47 m for Group B. The colour temperature was measured with a PhotoResearch 705. Maximum luminance of white was set to 248 cd/m2 and was measured with a Hagner Screenmaster. The maximum, mean and minimum Blur Edge Time (BET) of the display were 42 ms, 21 ms and 16 ms respectively measured with the dynamic feature of the backlight turned off. It was measured as described in Tourancheau et al (2009a)5. Longer BET’s was for rather dark transitions, so they may be less disturbing for the viewers, otherwise most BETs were about 20 ms, which does not introduce more than acceptable blur according Tourancheau et al (2009b)6. 15 valid non-expert viewers per viewing condition participated. The viewers had a mean age of 36 and median of 33. The oldest participant was 65 year, whereas the youngest was 19. The percentage of women was 40%. Viewers provided vote responses using a mouse, by clicking on the corresponding radio button shown in Figure 1. Both the mouse and LCD screen were connected to a fairly silent PC located in the room. The PC was running a 64 bit Windows Vista, using Intel Core 2 Quad processor (Q9550 at 2.83 GHz) with 8 GByte of primary memory. The graphics card was an ATI Radeon HD4870 X2 with 2 GByte of memory. The subjective results were stored directly on the PC, which was also used to store and play the video content, using software, developed at Acreo7. The playback of a video clip was done, by pre-load the clips in the memory of the graphics card. This was done to make certain that the update of each played frame was performed in synchronization with the update of the display update. Each frame was shown during one update frequency periods to obtain a frame rate of 50 fps. Sequences were loaded during playing and voting time to minimize any waiting for the subjects, using multi-threading techniques. The software also randomized the PVS per viewer. The PVSs were generated offline, meaning that no live transmission or playback system was set up for the sequence creation. Instead the video was processed one step at a time with intermediate files saved to disk. Figure 2 shows the steps involved in the offline processing procedure. Preprocessing Encoding Packet loss simulation Decoding Postprocessing Figure 2: Steps applied when creating the PVSs. The preprocessing of the video involves scaling the source video from 1080i/50 to 720p/50. This was made in Avisynth using the Lanczos resize and bob de-interlacing methods. The bob de-interlacing method preserves the resolution while maintaining the rate of the fields of the input video. Video was encoded with h.264 main profile at bitrates of 4 and 8 Mbps. The output of the encoding was a bitstream file encapsulated in MPEG-2 TS packets. Simulation of packet losses was made directly on the encapsulated bitstream file. When sending MPEG-2 TS streams over IP, the 188 bytes large MPEG-2 TS packets are often packed in UDP packets in groups of seven. The reason for this is to minimize the extra bits needed for headers while still being able to fit the packets in a Maximum Transmission Unit (MTU). The loss simulation was therefore made at UDP level, i.e. a loss of a UDP packet resulted in seven lost MPEG-2 TS packets. Packets were lost according to a bursty distribution determined by a two state Markov chain. Packet loss rates of 0.05%, 0.2%, 0.5%, 1% and 5% were simulated. Decoding was done using the FFMPEG decoder and a proprietary decoder. In the case of packet losses the built-in concealment method of the decoder was used. A third concealment method was also tested; freezing the video for a certain amount of time. The freezing concealment method was not applied on erroneous bitstream files, but was simulated as a post-processing step directly on the decoded video file by repeating and dropping frames using Avisynth. Freezing was inserted in one random position in the video with duration of 0.5s, 1s or 3s. The test matrix was based on eight 10 sec source sequences, with the following content: SRC 1: Sport scene, includes fast motion SCR 2: Nature scene with birds, fast motion and many details SCR 3: Animal scene, very detailed SRC 4: Close-up on birds, slow motion SRC 5: Interview situation SRC 6: A turning table, much movement and many details SRC 7: Panning landscape with running people in focus SRC 8: Panning a city from a bird's-eye view All SRC-sequences were treated with 25 Hypothetical Reference Conditions (HRC) listed in Table 1 and then the reference has been counted as one HCR. This gave in total 200 PVSs, which were presented in two sessions. The total time for a viewer was about one hour. Table 1: Description of the HRCs used in the test. HRC1 HRC2 HRC3 HRC4 HRC5 HRC6 HRC7 HRC8 HRC9 HRC10 HRC11 HRC12 HRC13 HRC14 HRC15 HRC16 HRC17 HRC18 HRC19 HRC20 HRC21 HRC22 HRC23 HRC24 HRC25 Reference 4Mbps 0.5s freezing 4Mbps 1s freezing 4Mbps 3s freezing 4Mbps 0.05% packet loss, proprietary codec 4Mbps 0.05% packet loss, FFmpeg codec 4Mbps 0.2% packet loss, FFmpeg codec 4Mbps 0.5% packet loss, proprietary codec 4Mbps 0.5% packet loss, FFmpeg codec 4Mbps 1% packet loss, proprietary codec 4Mbps 1% packet loss, FFmpeg codec 4Mbps 5% packet loss, FFmpeg codec 4Mbps 8Mbps 0.5s freezing 8Mbps 1s freezing 8Mbps 3s freezing 8Mbps 0.05% packet loss, proprietary codec 8Mbps 0.05% packet loss, FFmpeg codec 8Mbps 0.2% packet loss, FFmpeg codec 8Mbps 0.5% packet loss, proprietary codec 8Mbps 0.5% packet loss, FFmpeg codec 8Mbps 1% packet loss, proprietary codec 8Mbps 1% packet loss, FFmpeg codec 8Mbps 5% packet loss, FFmpeg codec 8Mbps 3. RESULTS The MOS and the DMOS were calculated from the collected opinion scores, as described above. To get the DMOS to be in the same range as the MOS, the difference opinion scores were computed by: scorePVS -scoreREF + 5. We will present results were MOS and DMOS have been aggregated over either all the HRCs or all the SRCs. These will sometimes be referred to as MOS and DMOS for simplicity, even if they are means of MOS and DMOS. The results shows that all the quality levels were used by the subjects, but the lower quality levels were used more than the higher, as shown by the histogram in Figure 3. It could also be noted by the left part of Figure 4, where the MOS taken across the HRCs (as well as the users) are displayed. All the SRCs except one had an average MOS below 3. The scene that had the overall best quality was SRC 4 and the scene with the overall worst quality was SRC 7. In the right part of Figure 4, the DMOS taken over the HRCs are presented. The main difference here from the plot of the MOS, apart from that the differences between the difference SRCs are a bit accentuated, is that SRC 1 gets a relatively higher values. This is due to the score of the reference is lowest for this SRC. The MOS of the references is show in Table 2. Relative score usage 35 30 Score usage (%) 25 20 15 10 5 0 1 2 3 4 5 ACR score values Figure 3: The distribution of votes used during the test 5 4 4 3 3 MOS DMOS 5 2 2 1 1 0 0 1 2 3 4 5 6 SRC number 7 8 1 2 3 4 5 6 7 SRC number Figure 4: The MOS (left) and the DMOS (right) taken across the users and different HRCs in the test (error bars indicates 95% confidence intervals) Table 2: The MOS scores of the references SRC 1 2 3 4 5 6 7 8 MOS 4,3 4,7 4,6 4,4 4,4 4,8 4,7 4,6 8 A mixed model analysis of variance (ANOVA) was conducted on the raw opinion scores. The effect of scaling (F(1,12) = 4.99), SRC (F(7,159)=6.22), HRC(F(24,166)=31.8) were all significant (95% confidence level i.e.p<0.05). The twoway interactions were all significant too, but the three-way was not. This means that according to this analysis the presentation for i.e. upscaled or not upscaled has an impact on the quality, see further discussion below. The other significant effect merely says that the variation of the content i.e. SRCs and the different HRCs have a significant effect on the quality, as well as its interaction with each other. A mixed model analysis of variance (ANOVA) was conducted on difference opinion scores as well. In this case the effect of scaling (F(1,9) = 0.27) was not significant, whereas SRC (F(6,70)=9.22), HRC(F(25,159)=27.7) were still significant (p<0.05). The two-way interactions were all significant too, but the three-way was not. The important difference here compared to analysis of the MOS was that the difference between the upscaled presentation and not upscaled presentation was no longer significant. The average MOS and the average DMOS taken across all the SRCs gives results on the impact of the different HRCs. These are presented in Figure 5 to Figure 8. In Figure 5 and Figure 6 the MOS and DMOS of the freezing HRCs are shown. The MOS results to the left and the DMOS results to the right. In Figure 5 the MOS and DMOS values are plotted against the freezing lengths ordered by length, but not positioned on the x-axis according to freezing length in seconds, whereas Figure 6 the MOS and DMOS is plotted against freezing time in seconds. 5 5 4Mbit 8Mbit 4Mbit 8Mbit 4 MOS DMOS 4 3 3 2 2 Reference No packet loss 0.5 s 1.0 s 3.0 s No packet loss 0.5 s 1.0 s 3.0 s Figure 5: The MOS (left) and DMOS (right)of the freezing HRCs. 5 5 4Mbps 4Mbps 8Mbps 8Mbps 4 MOS DMOS 4 3 3 2 2 0 0,5 1 1,5 Time (sec) 2 2,5 3 0 0,5 1 1,5 2 2,5 3 Time (sec) Figure 6: The MOS (left) and DMOS (right) of the freezing HRCs plotted against the the freezing time.. The MOS and DMOS of the packet loss HRCs are shown in Figure 7 and Figure 8 (MOS to the left and DMOS to the right). In Figure 7 the MOS and DMOS are plotted against packet loss ordered by loss rate where the data points are positioned on the x-axis according to case, not according to rate, whereas in Figure 8 the MOS and DMOS are plotted against the packet loss frequency in percent. Note that the higher bitrate encoded PVSs gets lower quality for higher packet loss frequencies than the lower bitrate PVSs. 1 - Reference 2 - No packet loss 3 - 0.05% proprietary codec 4 - 0.05% FFmpeg 5 - 0.2% FFmpeg 6 - 0.5% proprietary codec 7 - 0.5% FFmpeg 8 - 1% proprietary codec 9 - 1% FFmpeg 10 - 5% FFmpeg 4Mbit 8Mbit MOS 4 3 2 - No packet loss 3 - 0.05% proprietary codec 4 - 0.05% FFmpeg 5 - 0.2% FFmpeg 6 - 0.5% proprietary codec 7 - 0.5% FFmpeg 8 - 1% proprietary codec 9 - 1% FFmpeg 10 - 5% FFmpeg 5 4Mbit 8Mbit 4 DMOS 5 2 3 2 1 1 1 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 Figure 7: The MOS (left) and DMOS (right) of the packet loss HRCs. 5 5 FFMPEG 4Mbps FFMPEG 4Mbps Prop 4Mbps 4 FFMPEG 8Mbps 4 Prop 8 Mbps DMOS MOS Prop 4Mbps FFMPEG 8Mbps 3 2 Prop 8 Mbps 3 2 1 0 0,5 1 1,5 2 2,5 3 Packet loss (%) 3,5 4 4,5 5 1 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 Packet loss (%) Figure 8: The mean MOS (left) and DMOS (right) of the packet loss HRCs plotted against packet loss rate in percent 4. DISCUSSION AND CONCLUSIONS An experiment was conducted simulating the visual effect of H.264 based IPTV set-top-boxes, when they are faced with different transmission errors, in the form of packet loss. A decoder is faced with problem of somehow conceal the error and the concealment methods studied here are freezing, and the concealment used by the FFMPEG decoder and a propriety decoder. Another purpose was to study if presenting the video upscaled on a full HDTV 1080p display or not upscaled in a rectangle on the screen with grey surround had any impact. An Anova analysis showed that there were a significant quality difference between showing the video upscaled to the full screen or just showing it pixel mapped in the center of the screen. However, this effect was very close to the boundary for not being significant at the 95% level i.e. p < 0.05. In this case p = 0.045. It has been reported before that the picture size is an important quality factor8, but in this case the quality increase was counterbalanced with decrease in quality by the upscaling. The significance for the scaling goes away for the DMOS, which is expected since ideally the effect of presentation (display, scale, size) should affect the reference and the PVS in the same way, which is then taken out by the DMOS calculation. This support that this is the case. The Anova also showed that the quality impact of different SRCs and HRCs, as well as their interactions were significant. The interactions effect shows that depending on content, the different HRCs will have different impact on different SRCs. The result shows that the experiment was not completely balanced, since there were a larger number of scores on lower qualities than on higher. Although, care was taken to get an experiment where the votes are spread as even as possible on the available categories, but the real distribution can only be obtained when the experiment has been completed. For the freezing the quality drops down below the level of good (< 3.5) already at 0.5 s. This is in inline with other results that looks at user reaction on waiting time9. Furthermore, freezing even up to 3 s, almost a third of the viewing time of the clip, was not experienced as severe as the highest packet loss rates. The packet loss impact is interesting; see Figure 7 and Figure 8. For higher bitrates encoded videos and very low packet loss rates, the quality is higher for the higher encoding bitrates, but when the packet loss rates increases the quality drops faster for the higher bitrates and becomes lower than for the lower bitrate. The likely explanation is that since a higher bitrate video streams contains more packets than a lower bitrate one, there will be more packets lost in abslolute numbers. Even if both video streams are losing packets according to the same distribution, a bursty distribution in this case. The higher bitrate stream will have its loss spread out and therefore they will also be most likely more visible. 5. ACKNOWLEDGEMENTS This work has been supported by VINNOVA (Swedish Governmental Agency for Innovation Systems), which is hereby gratefully acknowledged. 6. REFERENCES 1. Haglund, L., Guest, N, Einerman, S., Öster, H., Björkman, P., and Graf, H., "Overall-Quality Assessment When Targeting Wide-Xga Flat Panel Displays", Sveriges Television, Institut fûr Rundfunktechnik, (2002) 2. VQEG, "HDTV Group: Test Plan for Evaluation of Video Quality Models for Use With High Definition TV Content (Ver 3.0)", Video Quality Experts Group, www.vqeg.org , (2009) 3. FFMPEG, FFMPEG: complete, cross-platform solution to record, convert and stream audio and video [on-line], ffmpeg.org, Accessed: 21 July 2009 4. ITU-T, "Subjective Video Quality Assessment Methods for Multimedia Applications", ITU-T Rec. P.910, International Telecommunication Union, Telecommunication standardization sector, (1999) 5. Tourancheau, S., Brunnström, K., Andrén, B., and Le Callet, P., "LCD motion-blur estimation using different measurement methods", Journal of the Society for Information Display 17, 239-249 (2009) 6. Tourancheau, S., Andrén, B., Brunnström, K., and Le Callet, P., "61.3 - Visual Annoyance and User Acceptance of LCD Motion-Blur", SID Symposium Digest of Technical Papers XXXX, (2009) 7. Jonsson, J. and Brunnström, K., "Getting Started With ArcVQWin", acr022250, Acreo AB, Kista, Sweden , (2007) 8. Westerink, J. H. D. M. and Roufs, J. A. J., "Subjective image quality as function of viewing distance, resolution and picture size", SMPTE Journal 113-119 (1989) 9. Kooij, R., Nikolai, F., Ahmed, K., and Brunnström, K., "Model validation of channel zapping quality", Proc. of SPIE-IS&T Human Vision and Electronic Imaging XII, 7240, B. Rogowitz and T. N. Pappas Eds., paper 31 (2009)
© Copyright 2024