Download Report

The impact of transmission errors on progressive 720 lines HDTV
coded with H.264
Kjell Brunnström1, Daniel Stålenbring1, Martin Pettersson2, and Jörgen Gustafsson2
1
IPTV, Video and Display Quality, NetLab. Acreo AB, Kista, Sweden
2
Ericsson Research, Sweden
Email: [email protected]
ABSTRACT
TV sent over the networks based on the Internet Protocol i.e. IPTV is moving towards high definition (HDTV). There
has been quite a lot of work on how the HDTV is affected by different codecs and bitrates, but the impact of
transmission errors over IP-networks have been less studied.
This study was focusing on H.264 encoded 1280x720 progressive HDTV and was comparing three different
concealment methods for different packet loss rates. The first was included in a propriety decoder, the second was part of
FFMPEG and the third was freezing of different length. The target is to simulate what typically IPTV settop-boxes will
do when encountering packet loss. Another aim was to study whether the presentation upscaled on the full HDTV screen
or presented pixel mapped in a smaller area in the center of the sceen would have an effect on the quality.
The results show that there were differences between the two packet loss concealment methods in FFMPEG and in the
propriety codec. Freezing seemed to have similar effect as been reported before. For low rates of transmission errors the
coding impairments has impact on the quality, but for higher degree of transmission errors these does not affect the
quality, since they become overshadowed by transmission error. An interesting effect was found in that the higher bitrate
videos goes from having higher quality for lower degree of packet loss rates, to having lower quality than the lower
bitrate video at higher packet loss rates. The different way of presenting the video i.e. upscaled or not-upscaled was
significant on the 95% level, but just about. It was not significant when considering DMOS i.e. the mean of the
difference between the scores for the reference and the scores for the distorted sequences.
Keywords: Video Quality, H.264, Transmission errors, HDTV, 720p, Error concealment
1. INTRODUCTION
TV sent over networks based on the Internet Protocol i.e. IPTV is moving towards high definition (HDTV). There has
been quite a lot of work on how the HDTV is affected by different codecs and bitrates, but the impact of transmission
errors over IP-networks have been less studied. Studies by e.g. Haglund et al. (2002)1 have shown that 1280 by 720
progressive (720p) format is a better broadcasting format than the 1920 by 1080 interlaced (1080i), when it comes to
coding efficiency i.e. comparable quality to a lower number of bits. This study has, therefore, focused on 720p.
The Video Quality Experts Group (VQEG) is an independent group of experts, which brings together organizations in
government laboratories, academia and the industry to evaluate video quality metrics. The outcomes of the evaluations
are then given as input to standards bodies like the International Telecommunication Union (ITU). However, this study is
not done for VQEG, but its purpose is partly complementary. VQEG is currently carrying out an HDTV test2, that is
scheduled to be completed in the spring 2010. In this test, the format 720p i.e. 1280 by 720 progressive was not included
as an independent format. Instead it was considered as a Hypothetical Reference Circuit (HRC), which means that source
is 1080p (1920 by 1080 progressive). The video is down sampled to 720p and at the end upsampled to 1080p again. In
between other HRCs such codecs or transmission errors might have been applied. This study was also intended to test
whether this would have an effect on the quality scores collected from the viewers.
When transmission errors occur for a streaming video service in an IP-network e.g. IPTV it comes in the form of packet
loss. This has usually severe impact on the quality, but the end result is dependent on the concealment used by the
decoder. The study was comparing three different concealment methods for different packet loss rates. The first was
included in a propriety decoder, the second was part of FFMPEG3 and third was freezing of different length. The target is
to simulate what typically IPTV settop-boxes might do when encountering packet loss.
2. METHOD
The test was conducted mainly in accordance with the VQEG HDTV project testplan2. However, the main difference
was that in this test 720p was the main target. It was displayed in two different ways; one group with no upscaling (the
screen filled with gray surround) and in the other group with upscaling in the TV-set.
Subjective video quality was measured using the Absolute Category Rating (ACR)4 method. The test video sequences
were presented one at a time and afterward rated independently on the ACR scale, as seen in Figure 1. In the test, the
ACR procedure included both the processed and the reference, i.e. unprocessed source, versions of each video sequence.
The reference sequences were not identified as such to the viewers (hidden reference approach). At the data analysis the
average of the scores taken over all users were calculated, these values are referred to as Mean Opinion Scores (MOS).
Furthermore, the quality scores for the processed video sequences (PVS) were subtracted from the quality scores of the
corresponding reference source (SRC) sequences to obtain a difference MOS referred to as DMOS. This procedure is
known as “hidden reference” removal.
Figure 1: The voting screen that was presented to viewers.
The viewing room, located in Acreo’s multimedia lab, was prepared to conform to the specifications of ITU-T Rec.
P.9104. Each viewer, one at a time, completed the test in a section of a room divided by grey drapery. Viewers were
seated three screen heights (3H) from the screen, and instructed not to move their heads too much or lean forward. The
viewing distance was not otherwise controlled. The ambient light was produced by high frequency D65 fluorescent tubes
located in the ceiling generating a light level of 20 Lux, measured at the table.
The test video sequences were presented on a high-grade consumer HDTV LCD display (Samsung LE40A796). The
display was used in full resolution i.e. 1920x1080 with update frequency of 50 fps for one of the groups (Group A). The
video was then positioned in the middle and the rest of the screen was filled with an even grey colour (grey value 128).
For the other group the screen resolution was set to 1280x720 and 50 fps (Group B). In this case the video was upscaled
by the display to cover the whole screen. This meant that the physical distance were different in the two groups 0.99 m
for Group A and 1.47 m for Group B. The colour temperature was measured with a PhotoResearch 705. Maximum
luminance of white was set to 248 cd/m2 and was measured with a Hagner Screenmaster. The maximum, mean and
minimum Blur Edge Time (BET) of the display were 42 ms, 21 ms and 16 ms respectively measured with the dynamic
feature of the backlight turned off. It was measured as described in Tourancheau et al (2009a)5. Longer BET’s was for
rather dark transitions, so they may be less disturbing for the viewers, otherwise most BETs were about 20 ms, which
does not introduce more than acceptable blur according Tourancheau et al (2009b)6.
15 valid non-expert viewers per viewing condition participated. The viewers had a mean age of 36 and median of 33.
The oldest participant was 65 year, whereas the youngest was 19. The percentage of women was 40%.
Viewers provided vote responses using a mouse, by clicking on the corresponding radio button shown in Figure 1. Both
the mouse and LCD screen were connected to a fairly silent PC located in the room. The PC was running a 64 bit
Windows Vista, using Intel Core 2 Quad processor (Q9550 at 2.83 GHz) with 8 GByte of primary memory. The graphics
card was an ATI Radeon HD4870 X2 with 2 GByte of memory. The subjective results were stored directly on the PC,
which was also used to store and play the video content, using software, developed at Acreo7. The playback of a video
clip was done, by pre-load the clips in the memory of the graphics card. This was done to make certain that the update of
each played frame was performed in synchronization with the update of the display update. Each frame was shown
during one update frequency periods to obtain a frame rate of 50 fps. Sequences were loaded during playing and voting
time to minimize any waiting for the subjects, using multi-threading techniques. The software also randomized the PVS
per viewer.
The PVSs were generated offline, meaning that no live transmission or playback system was set up for the sequence
creation. Instead the video was processed one step at a time with intermediate files saved to disk. Figure 2 shows the
steps involved in the offline processing procedure.
Preprocessing
Encoding
Packet loss
simulation
Decoding
Postprocessing
Figure 2: Steps applied when creating the PVSs.
The preprocessing of the video involves scaling the source video from 1080i/50 to 720p/50. This was made in Avisynth
using the Lanczos resize and bob de-interlacing methods. The bob de-interlacing method preserves the resolution while
maintaining the rate of the fields of the input video. Video was encoded with h.264 main profile at bitrates of 4 and 8
Mbps. The output of the encoding was a bitstream file encapsulated in MPEG-2 TS packets.
Simulation of packet losses was made directly on the encapsulated bitstream file. When sending MPEG-2 TS streams
over IP, the 188 bytes large MPEG-2 TS packets are often packed in UDP packets in groups of seven. The reason for this
is to minimize the extra bits needed for headers while still being able to fit the packets in a Maximum Transmission Unit
(MTU). The loss simulation was therefore made at UDP level, i.e. a loss of a UDP packet resulted in seven lost MPEG-2
TS packets. Packets were lost according to a bursty distribution determined by a two state Markov chain. Packet loss
rates of 0.05%, 0.2%, 0.5%, 1% and 5% were simulated.
Decoding was done using the FFMPEG decoder and a proprietary decoder. In the case of packet losses the built-in
concealment method of the decoder was used. A third concealment method was also tested; freezing the video for a
certain amount of time. The freezing concealment method was not applied on erroneous bitstream files, but was
simulated as a post-processing step directly on the decoded video file by repeating and dropping frames using Avisynth.
Freezing was inserted in one random position in the video with duration of 0.5s, 1s or 3s.
The test matrix was based on eight 10 sec source sequences, with the following content:
SRC 1: Sport scene, includes fast motion
SCR 2: Nature scene with birds, fast motion and many details
SCR 3: Animal scene, very detailed
SRC 4: Close-up on birds, slow motion
SRC 5: Interview situation
SRC 6: A turning table, much movement and many details
SRC 7: Panning landscape with running people in focus
SRC 8: Panning a city from a bird's-eye view
All SRC-sequences were treated with 25 Hypothetical Reference Conditions (HRC) listed in Table 1 and then the
reference has been counted as one HCR. This gave in total 200 PVSs, which were presented in two sessions. The total
time for a viewer was about one hour.
Table 1: Description of the HRCs used in the test.
HRC1
HRC2
HRC3
HRC4
HRC5
HRC6
HRC7
HRC8
HRC9
HRC10
HRC11
HRC12
HRC13
HRC14
HRC15
HRC16
HRC17
HRC18
HRC19
HRC20
HRC21
HRC22
HRC23
HRC24
HRC25
Reference
4Mbps 0.5s freezing
4Mbps 1s freezing
4Mbps 3s freezing
4Mbps 0.05% packet loss, proprietary codec
4Mbps 0.05% packet loss, FFmpeg codec
4Mbps 0.2% packet loss, FFmpeg codec
4Mbps 0.5% packet loss, proprietary codec
4Mbps 0.5% packet loss, FFmpeg codec
4Mbps 1% packet loss, proprietary codec
4Mbps 1% packet loss, FFmpeg codec
4Mbps 5% packet loss, FFmpeg codec
4Mbps
8Mbps 0.5s freezing
8Mbps 1s freezing
8Mbps 3s freezing
8Mbps 0.05% packet loss, proprietary codec
8Mbps 0.05% packet loss, FFmpeg codec
8Mbps 0.2% packet loss, FFmpeg codec
8Mbps 0.5% packet loss, proprietary codec
8Mbps 0.5% packet loss, FFmpeg codec
8Mbps 1% packet loss, proprietary codec
8Mbps 1% packet loss, FFmpeg codec
8Mbps 5% packet loss, FFmpeg codec
8Mbps
3. RESULTS
The MOS and the DMOS were calculated from the collected opinion scores, as described above. To get the DMOS to be
in the same range as the MOS, the difference opinion scores were computed by: scorePVS -scoreREF + 5.
We will present results were MOS and DMOS have been aggregated over either all the HRCs or all the SRCs. These will
sometimes be referred to as MOS and DMOS for simplicity, even if they are means of MOS and DMOS.
The results shows that all the quality levels were used by the subjects, but the lower quality levels were used more than
the higher, as shown by the histogram in Figure 3. It could also be noted by the left part of Figure 4, where the MOS
taken across the HRCs (as well as the users) are displayed. All the SRCs except one had an average MOS below 3. The
scene that had the overall best quality was SRC 4 and the scene with the overall worst quality was SRC 7. In the right
part of Figure 4, the DMOS taken over the HRCs are presented. The main difference here from the plot of the MOS,
apart from that the differences between the difference SRCs are a bit accentuated, is that SRC 1 gets a relatively higher
values. This is due to the score of the reference is lowest for this SRC. The MOS of the references is show in Table 2.
Relative score usage
35
30
Score usage (%)
25
20
15
10
5
0
1
2
3
4
5
ACR score values
Figure 3: The distribution of votes used during the test
5
4
4
3
3
MOS
DMOS
5
2
2
1
1
0
0
1
2
3
4
5
6
SRC number
7
8
1
2
3
4
5
6
7
SRC number
Figure 4: The MOS (left) and the DMOS (right) taken across the users and different HRCs in the test (error bars
indicates 95% confidence intervals)
Table 2: The MOS scores of the references
SRC
1
2
3
4
5
6
7
8
MOS
4,3
4,7
4,6
4,4
4,4
4,8
4,7
4,6
8
A mixed model analysis of variance (ANOVA) was conducted on the raw opinion scores. The effect of scaling (F(1,12)
= 4.99), SRC (F(7,159)=6.22), HRC(F(24,166)=31.8) were all significant (95% confidence level i.e.p<0.05). The twoway interactions were all significant too, but the three-way was not. This means that according to this analysis the
presentation for i.e. upscaled or not upscaled has an impact on the quality, see further discussion below. The other
significant effect merely says that the variation of the content i.e. SRCs and the different HRCs have a significant effect
on the quality, as well as its interaction with each other.
A mixed model analysis of variance (ANOVA) was conducted on difference opinion scores as well. In this case the
effect of scaling (F(1,9) = 0.27) was not significant, whereas SRC (F(6,70)=9.22), HRC(F(25,159)=27.7) were still
significant (p<0.05). The two-way interactions were all significant too, but the three-way was not. The important
difference here compared to analysis of the MOS was that the difference between the upscaled presentation and not
upscaled presentation was no longer significant.
The average MOS and the average DMOS taken across all the SRCs gives results on the impact of the different HRCs.
These are presented in Figure 5 to Figure 8.
In Figure 5 and Figure 6 the MOS and DMOS of the freezing HRCs are shown. The MOS results to the left and the
DMOS results to the right. In Figure 5 the MOS and DMOS values are plotted against the freezing lengths ordered by
length, but not positioned on the x-axis according to freezing length in seconds, whereas Figure 6 the MOS and DMOS is
plotted against freezing time in seconds.
5
5
4Mbit
8Mbit
4Mbit
8Mbit
4
MOS
DMOS
4
3
3
2
2
Reference
No packet loss
0.5 s
1.0 s
3.0 s
No packet loss
0.5 s
1.0 s
3.0 s
Figure 5: The MOS (left) and DMOS (right)of the freezing HRCs.
5
5
4Mbps
4Mbps
8Mbps
8Mbps
4
MOS
DMOS
4
3
3
2
2
0
0,5
1
1,5
Time (sec)
2
2,5
3
0
0,5
1
1,5
2
2,5
3
Time (sec)
Figure 6: The MOS (left) and DMOS (right) of the freezing HRCs plotted against the the freezing time..
The MOS and DMOS of the packet loss HRCs are shown in Figure 7 and Figure 8 (MOS to the left and DMOS to the
right). In Figure 7 the MOS and DMOS are plotted against packet loss ordered by loss rate where the data points are
positioned on the x-axis according to case, not according to rate, whereas in Figure 8 the MOS and DMOS are plotted
against the packet loss frequency in percent. Note that the higher bitrate encoded PVSs gets lower quality for higher
packet loss frequencies than the lower bitrate PVSs.
1 - Reference
2 - No packet loss
3 - 0.05% proprietary codec
4 - 0.05% FFmpeg
5 - 0.2% FFmpeg
6 - 0.5% proprietary codec
7 - 0.5% FFmpeg
8 - 1% proprietary codec
9 - 1% FFmpeg
10 - 5% FFmpeg
4Mbit
8Mbit
MOS
4
3
2 - No packet loss
3 - 0.05% proprietary codec
4 - 0.05% FFmpeg
5 - 0.2% FFmpeg
6 - 0.5% proprietary codec
7 - 0.5% FFmpeg
8 - 1% proprietary codec
9 - 1% FFmpeg
10 - 5% FFmpeg
5
4Mbit
8Mbit
4
DMOS
5
2
3
2
1
1
1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
Figure 7: The MOS (left) and DMOS (right) of the packet loss HRCs.
5
5
FFMPEG 4Mbps
FFMPEG 4Mbps
Prop 4Mbps
4
FFMPEG 8Mbps
4
Prop 8 Mbps
DMOS
MOS
Prop 4Mbps
FFMPEG 8Mbps
3
2
Prop 8 Mbps
3
2
1
0
0,5
1
1,5
2
2,5
3
Packet loss (%)
3,5
4
4,5
5
1
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
Packet loss (%)
Figure 8: The mean MOS (left) and DMOS (right) of the packet loss HRCs plotted against packet loss rate in percent
4. DISCUSSION AND CONCLUSIONS
An experiment was conducted simulating the visual effect of H.264 based IPTV set-top-boxes, when they are faced with
different transmission errors, in the form of packet loss. A decoder is faced with problem of somehow conceal the error
and the concealment methods studied here are freezing, and the concealment used by the FFMPEG decoder and a
propriety decoder. Another purpose was to study if presenting the video upscaled on a full HDTV 1080p display or not
upscaled in a rectangle on the screen with grey surround had any impact.
An Anova analysis showed that there were a significant quality difference between showing the video upscaled to the
full screen or just showing it pixel mapped in the center of the screen. However, this effect was very close to the
boundary for not being significant at the 95% level i.e. p < 0.05. In this case p = 0.045. It has been reported before that
the picture size is an important quality factor8, but in this case the quality increase was counterbalanced with decrease in
quality by the upscaling. The significance for the scaling goes away for the DMOS, which is expected since ideally the
effect of presentation (display, scale, size) should affect the reference and the PVS in the same way, which is then taken
out by the DMOS calculation. This support that this is the case.
The Anova also showed that the quality impact of different SRCs and HRCs, as well as their interactions were
significant. The interactions effect shows that depending on content, the different HRCs will have different impact on
different SRCs.
The result shows that the experiment was not completely balanced, since there were a larger number of scores on lower
qualities than on higher. Although, care was taken to get an experiment where the votes are spread as even as possible on
the available categories, but the real distribution can only be obtained when the experiment has been completed.
For the freezing the quality drops down below the level of good (< 3.5) already at 0.5 s. This is in inline with other
results that looks at user reaction on waiting time9. Furthermore, freezing even up to 3 s, almost a third of the viewing
time of the clip, was not experienced as severe as the highest packet loss rates.
The packet loss impact is interesting; see Figure 7 and Figure 8. For higher bitrates encoded videos and very low packet
loss rates, the quality is higher for the higher encoding bitrates, but when the packet loss rates increases the quality drops
faster for the higher bitrates and becomes lower than for the lower bitrate. The likely explanation is that since a higher
bitrate video streams contains more packets than a lower bitrate one, there will be more packets lost in abslolute
numbers. Even if both video streams are losing packets according to the same distribution, a bursty distribution in this
case. The higher bitrate stream will have its loss spread out and therefore they will also be most likely more visible.
5. ACKNOWLEDGEMENTS
This work has been supported by VINNOVA (Swedish Governmental Agency for Innovation Systems), which is hereby
gratefully acknowledged.
6. REFERENCES
1.
Haglund, L., Guest, N, Einerman, S., Öster, H., Björkman, P., and Graf, H., "Overall-Quality Assessment When
Targeting Wide-Xga Flat Panel Displays", Sveriges Television, Institut fûr Rundfunktechnik, (2002)
2.
VQEG, "HDTV Group: Test Plan for Evaluation of Video Quality Models for Use With High Definition TV
Content (Ver 3.0)", Video Quality Experts Group, www.vqeg.org , (2009)
3.
FFMPEG, FFMPEG: complete, cross-platform solution to record, convert and stream audio and video [on-line],
ffmpeg.org, Accessed: 21 July 2009
4.
ITU-T, "Subjective Video Quality Assessment Methods for Multimedia Applications", ITU-T Rec. P.910,
International Telecommunication Union, Telecommunication standardization sector, (1999)
5.
Tourancheau, S., Brunnström, K., Andrén, B., and Le Callet, P., "LCD motion-blur estimation using different
measurement methods", Journal of the Society for Information Display 17, 239-249 (2009)
6.
Tourancheau, S., Andrén, B., Brunnström, K., and Le Callet, P., "61.3 - Visual Annoyance and User Acceptance of
LCD Motion-Blur", SID Symposium Digest of Technical Papers XXXX, (2009)
7.
Jonsson, J. and Brunnström, K., "Getting Started With ArcVQWin", acr022250, Acreo AB, Kista, Sweden , (2007)
8.
Westerink, J. H. D. M. and Roufs, J. A. J., "Subjective image quality as function of viewing distance, resolution and
picture size", SMPTE Journal 113-119 (1989)
9.
Kooij, R., Nikolai, F., Ahmed, K., and Brunnström, K., "Model validation of channel zapping quality", Proc. of
SPIE-IS&T Human Vision and Electronic Imaging XII, 7240, B. Rogowitz and T. N. Pappas Eds., paper 31 (2009)