Modeling Video Traffic from Multiplexed H.264 Videoconference Streams
ABSTRACT Due to the burstiness of video traffic, video modeling is very important in order to evaluate the performance of future wired and wireless networks. In this paper, we investigate the possibility of modeling H.264 videoconference traffic with well-known distributions. Our results regarding the behavior of single videoconference traces provide significant insight and help to build a Discrete Autoregressive (DAR(1)) model to capture the behavior of multiplexed H.264 videoconference movies from VBR coders.
-
Citations (0)
-
Cited In (0)
Page 1
Modeling Video Traffic from Multiplexed H.264
Videoconference Streams
Aggelos Lazaris
Departmentof Electronic and Computer Engineering
Technical University of Crete
Chania, Greece
Email: alazaris@telecom.tuc.gr
Polychronis Koutsakis
Department of Electrical and Computer Engineering
McMaster University
Hamilton, ON, Canada
Email: polk@ece.mcmaster.ca
Abstract—Due to the burstiness of video traffic, video modeling
is very important in order to evaluate the performance of future
wired and wireless networks. In this paper, we investigate the
possibility of modeling H.264 videoconference traffic with well-
known distributions. Our results regarding the behavior of single
videoconference traces provide significant insight and help to
build a Discrete Autoregressive (DAR(1)) model to capture the
behavior of multiplexed H.264 videoconference movies from VBR
coders.
I.INTRODUCTION
As traffic from video services is expected to be a substantial
portion of the traffic carried by emerging wired and wireless
networks [7][13], statistical source models are needed for
Variable Bit Rate (VBR) coded video in order to design
networks which are able to guarantee the strict Quality of
Service (QoS) requirements of the video traffic. Video packet
delay requirements are strict, because delays are annoying to a
viewer; whenever the delay experienced by a video packet
exceeds the corresponding maximum delay, the packet is
dropped and the video packet dropping requirements are
equally strict.
Hence, the problem of modeling video traffic, in general,
and videoconferencing, in particular, has been extensively
studied in the literature. VBR video models which have been
proposed in the literature include first-order autoregressive
(AR) models [2], discrete AR (DAR) models [1][3], Markov
renewal processes (MRP) [4], MRP transform-expand-sample
(TES) [5], finite-state Markov chain [6][7], Gamma-beta-auto-
regression (GBAR) models [8][9] (which capture data-rate
dynamics of VBR video conferences well but was found in [9]
to not be suitable for general MPEG video sources), discrete-
time Semi-Markov Processes (SMP) [10], wavelets [11],
multifractal and fractal methods [12].
In [14][15], different approaches are proposed for MPEG-1
traffic, based on the log-normal, Gamma, and a hybrid
Gamma/lognormal distribution model, respectively.
H.264 is the latest video coding standard of the ITU-T
Video Coding Experts Group (VCEG) and the ISO/IEC
Moving Picture Experts Group (MPEG). It has recently become
the most widely accepted video coding standard since the
deployment of MPEG2 at the dawn of digital television, and it
may soon overtake MPEG2 in common use. It covers all
common video applications ranging from mobile services and
videoconferencing to IPTV, HDTV, and HD video storage [18].
Standard H.264 encoders generate three types of video
frames: I (intracoded), P (predictive) and B (bidirectionally
predictive); i.e., while I frames are intra-coded, the generation
of P and B frames involves, in addition to intra-coding, the use
of motion prediction and interpolation techniques. I frames are,
on average, the largest in size, followed by P and then by B
frames.
Similarly to our recent work on modeling H.263
videoconference traffic [17], our present work initially focuses
on the accurate fitting of the marginal (stationary) distribution
of video frame sizes of single H.264 video traces. More
specifically, our work follows the steps of the work presented in
[3], where Heyman et al. analyzed three videoconference
sequences coded with a modified version of the H.261 video
coding standard and two other coding schemes, similar to the
H.261. The authors in [3] found that the marginal distributions
for all the sequences could be described by a gamma (or
equivalently negative binomial) distribution and used this result
to build a Discrete Autoregressive (DAR) model of order one,
which works well when several sources are multiplexed.
An important feature of common H.264 encoders is the
manner in which frame types are generated. Typical encoders
use a fixed Group-of-Pictures (GOP) pattern when compressing
a video sequence; the GOP pattern specifies the number and
temporal order of P and B frames between two successive I
frames. A GOP pattern is defined by the distance N between I
frames and the distance M between P frames.
In this work, we focus on the problem of modeling
videoconference traffic from H.264 encoders, which is a
relatively new and yet open issue in the relevant literature.
II.SINGLE-SOURCE H.264TRAFFIC MODELING
A. Frame-size histograms
In our work, we have studied two different long sequences
of H.264 VBR encoded videos in eighteen formats, from the
publicly available Video Trace Library of [19]. The selected
videos are of low or moderate motion (i.e., traces with very
similar characteristics to the ones of actual videoconference
traffic), in order to derive a statistical model which fits well the
real data.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
978-1-4244-2324-8/08/$25.00 © 2008 IEEE.
Page 2
The two traces are, respectively:
1)A demo from the Sony Digital Video Camera
2)An excerpt of NBC News
The length of the videos is 10 and 30 minutes, respectively.
The data for each trace consists of a sequence of the number of
cells per video frame and the type of video frame, i.e., I, P, or
B. Without loss of generality, we use 48-byte packets
throughout this work, but our modeling mechanism can be used
equally well with packets of other sizes. Table I presents the
trace statistics for each trace. The interframe period is 33.3 ms.
We have investigated the possibility of modeling the
eighteen traces with quite a few well-known distributions and
our results show that the best fit among these distributions is
achieved for all the traces studied with the use of the Pearson
type V distribution. The Pearson type V distribution (also
known as the “inverted Gamma” distribution) is generally used
to model the time required to perform some tasks (e.g.,
customer service time in a bank); other distributions which
have the same general use are the exponential, gamma, weibull
and lognormal distributions [20]. Since all of these distributions
have been often used for video traffic modeling in the literature,
they have been included in this work as fitting candidates, in
order to compare their modeling results in the case of H.264
videoconferencing.
The frame-size histogram based on the complete VBR
streams is shown, for all four sequences, to have the general
shape of a Pearson type V distribution. Fig. 1 presents
indicatively the histogram for the NBC News ([CIF, G16, B7,
F28]) sequence.
B.Statistical Tests and Autocorrelations
Our statistical tests were made with the use of Q-Q plots
[3][20], Kolmogorov-Smirnov [20] tests and Kullback-Leibler
divergence tests [21]. The Q-Q plot is a powerful goodness-of-
fit test, which graphically compares two data sets in order to
determine whether the data sets come from populations with a
common distribution (if they do, the points of the plot should
fall approximately along a 45-degree reference line). More
specifically, a Q-Q plot is a plot of the quantiles of the data
versus the quantiles of the fitted distribution (a z-quantile of X
is any value x such that P ((X ? x) = z). The Kolmogorov–
Smirnov test (KS-test) tries to determine if two datasets differ
significantly. The KS-test has the advantage of making no
assumption about the distribution of data, i.e., it is non-
parametric and distribution free. The KS-test uses the
maximum vertical deviation between the two curves as its
statistic D. The Kullback-Leibler divergence test (KL-test) is a
measure of the difference between two probability distributions.
The Pearson V distribution fit was shown to be the best in
comparison to the gamma, weibull, lognormal and exponential
distributions, which are presented here (comparisons were also
made with the negative binomial and Pareto distributions,
which were also worse fits than the Pearson V). However, as
already mentioned, although the Pearson V was shown to be the
better fit among all distributions, the fit is not perfectly
accurate. This was expected, as the gross differences in the
number of bits required to represent I, P and B frames impose a
degree of periodicity on H.264-encoded streams, based on the
cyclic GoP formats (therefore, this case is different than the
case of H.263 traffic we studied in [17], where the number of I
frames was so small in each trace that the trace could be
modeled as a whole).
Hence, we proceeded to study the frame size distribution for
each of the three different video frame types (I, P, B), in the
same way we studied the frame size distribution for the whole
trace. This approach was also used in [9][22].
TABLE I. TRACE STATISTICS
Video Name
[RES, G, B, F]a
Mean
(bits)
15816
1197
14632
1084
15081
1054
16624
1059
14067
954
12801
887
13129
898
Peak
(bits)
181096
28032
182520
28216
186872
29768
192272
31840
221664
23096
222888
23232
227680
25480
233296
28224
398544
143408
Variance
(bits2)
471117539
4925112
467920380
5007541
467131784
5179470
456464433
5246908
752947478
4693753
770225078
4856589
787021301
5243752
803054805
5818976
2684852964
327728805
NBC News
NBC News
NBC News
NBC News
NBC News
NBC News
NBC News
NBC News
Sony Demo
Sony Demo
Sony Demo
Sony Demo
Sony Demo
Sony Demo
Sony Demo
Sony Demo
Sony Demo
Sony Demo
[CIF, 16, 1, 28]
[CIF, 16, 1, 48]
[CIF, 16, 3, 28]
[CIF, 16, 3, 48]
[CIF, 16, 7, 28]
[CIF, 16, 7, 48]
[CIF, 16, 15, 28]
[CIF, 16, 15, 48]
[CIF, 16, 1, 28]
[CIF, 16, 1, 48]
[CIF, 16, 3, 28]
[CIF, 16, 3, 48]
[CIF, 16, 7, 28]
[CIF, 16, 7, 48]
[CIF, 16, 15b, 28] 14861
[CIF, 16, 15b, 48] 933
[HD, 12, 2, 48]
[HD, 12, 2, 38]
22513
7618
a. RES: Resolution, G: GoP Size, B: Number of B Frames, F: Quantization Parameters
b. When B=15 and G=16 there are no P frames in the trace sequence
Frame Size Histogram
0,E+00
1,E-05
2,E-05
3,E-05
4,E-05
5,E-05
6,E-05
7,E-05
8,E-05
9,E-05
1,E-04
050100150 200
Frame Sizes (KB)
Frequency (Percentile of total frames)
Figure 1. Frame size histogram for the NBC News trace with parameters:
[CIF, G16, B7, F28].
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
978-1-4244-2324-8/08/$25.00 © 2008 IEEE.
Page 3
Another approach, similar to the above, was proposed in
[14]. This scheme uses again lognormal distributions and
assumes that the change of a scene alters the average size of I
frames, but not the sizes of P and B frames. However, it is
shown in [4][15] that the average sizes of P and B frames can
vary by 20% and 30% (often more than that), respectively, in
subsequent scenes, therefore the size changes are statistically
significant.
The mean, peak and variance of the video frame sizes for
each video frame type (I, P and B) of each movie were taken
again from [19] and the Pearson type V parameters are
calculated based on the following formulas for the mean and
variance of Pearson V (the parameters for the other fitting
distributions are similarly obtained based on their respective
formulas).
The Probability Density Function (PDF) of a Pearson V
distribution with parameters (?, ?) is f(x)= [x-(?+1) e-?/x]/ [?-?
?(?)], for all x>0, and zero otherwise.
The mean and variance are given by the following
equations: Mean=?/(?-1), Variance=?2/[(?-1)2(?-2)]
The autocorrelation coefficient of lag-1 was also calculated
for all types of video frames of the eighteen movies, as it shows
the very high degree of correlation between successive frames
of the same type. The autocorrelation coefficient of lag-1 will
be used in the following Sections of this work, in order to build
a Discrete Autoregressive Model for each video frame type.
From the five distributions examined (Pearson V,
exponential, gamma, lognormal, weibull) the Pearson V
distribution once again provided the best fitting results for the
54 cases (18 movies, 3 types of frames per movie) studied.
In order to further verify the validity of our results, we
performed Kolmogorov-Smirnov and Kullback-Leibler tests for
all the 54 fitting attempts. The results of our tests confirm our
respective conclusions based on the Q-Q plots (i.e., the Pearson
V distribution is the best fit). Fig. 2 presents indicative results
from the KS-test. Regarding the KL-test, the results for the {I,
P, B} frames of the Sony Demo ([CIF, G16, B3, F48]) trace are
respectively, for the Pearson V distribution {0.364, 0.721,
0.432), for the Lognormal distribution {0.378, 0.864, 0.479},
for the Gamma distribution {0.387, 1.027, 0.543} and for the
Weibull distribution {0.453, 1.024, 0.533}.
Although controversy persists regarding the prevalence of
Long Range Dependence (LRD) in VBR video traffic
([25][26][27]), in the specific case of H.264-encoded video, we
have found that LRD is important. The autocorrelation function
for the NBC News ([CIF, G16, B7, F28]) trace is shown in Fig.
3 (the respective Figures for the other three traces are similar).
Three apparent periodic components are observed, one
containing lags with low autocorrelation, one with medium
autocorrelation and the other lags with high autocorrelation. We
observe that autocorrelation remains high even for large
numbers of lags and that both components decay very slowly;
both these facts are a clear indication of the importance of
LRD. The existence of strong autocorrelation coefficients is due
to the periodic recurrence of I, B and P frames.
Although the fitting results when modeling each video
frame type separately with the use of the Pearson V distribution
are clearly better than the results produced by modeling the
whole sequence uniformly, the high autocorrelation shown in
the Figure above can never be perfectly “captured” by a
distribution generating frame sizes independently, according to
a declared mean and standard deviation, and therefore none of
the fitting attempts (including the Pearson V), as good as they
might be, can achieve perfect accuracy. However, these results
lead us to extend our work in order to build a DAR model,
which inherently uses the autocorrelation coefficient of lag-1 in
its estimation. The model will be shown to accurately capture
the behavior of multiplexed H.264 videoconference movies, by
generating frame sizes independently for I, P and B frames.
Finally, it should be noted that in [16] we have successfully
modeled High Definition (HD) H.264 traces as a whole (i.e.,
with a similar approach to that of [17] for H.263 traces) and
used the result to propose an efficient MAC protocol for GEO
satellite networks. The Weibull distribution was shown to
provide the best results when modeling the traces as a whole,
slightly outperforming the Pearson V distribution. However, in
the case of the “Main Profile” traces from [19] (which consume
significantly smaller amounts of bandwidth than the HD ones)
the Pearson V distribution clearly excels as a fit both for the
whole trace and for the separate modeling of I, P, B frames.
00.10.2 0.30.40.5
Bytes
0.60.70.80.91
x 10
-3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Percentile
Kolmogorov - Smirnov Tests for B-sonyCIF-G16B3F48
Actual Trace
Exponential, D= 0.48555
Gamma, D= 0.35528
Lognormal, D= 0.34261
PearsonV, D= 0.30829
Weibull, D= 0.32609
Figure 2. KS-test (Comparison Percentile Plot) for the Sony Demo B
frames ([CIF, G16, B3, F48]).
Frame Size Autocorrelation
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
050100150
Lag[frames]
ACC
Figure 3. Autocorrelation Coefficients of the NBC News trace ([CIF,
G16, B7, F28]).
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
978-1-4244-2324-8/08/$25.00 © 2008 IEEE.
Page 4
III.
A Discrete Autoregressive model of order p, denoted as
DAR(p) [23], generates a stationary sequence of discrete
random variables with an arbitrary probability distribution and
with an autocorrelation structure similar to that of an
Autoregressive model. DAR(1) is a special case of a DAR(p)
process and it is defined as follows: let {Vn} and {Yn} be two
sequences of independent random variables. The random
variable Vn can take two values, 0 and 1, with probabilities 1-?
and ?, respectively. The random variable Yn has a discrete state
space S and P{Yn = i} = ?(i). The sequence of random variables
{Xn} which is formed according to the linear model:
Xn = Vn Xn-1 + (1- Vn) Yn
is a DAR(1) process.
A DAR(1) process is a Markov chain with discrete state
space S and a transition matrix:
P = ?I + (1-?) Q
where ? is the autocorrelation coefficient, I is the identity
matrix and Q is a matrix with Qij = ?(j) for i, j ? ?S.
Autocorrelations are usually plotted for a range W of lags.
The autocorrelation can be calculated by the formula:
?(W)= E[(Xi - ?)(Xi+w - ?)]/?2
where ? is the mean and ?2 the variance of the frame size for a
specific video trace.
As in [3], where a DAR(1) model with negative binomial
distribution was used to model the number of cells per frame of
VBR teleconferencing video, we want to build a model based
only on parameters which are either known at call set-up time
or can be measured without introducing much complexity in the
network. DAR(1) provides an easy and practical method to
compute the transition matrix and gives us a model based only
on four physically meaningful parameters, i.e., the mean, peak,
variance and the lag-1 autocorrelation coefficient ? of the
offered traffic (these correlations, as already explained, are
typically very high for videoconference sources). The DAR(1)
model can be used with any marginal distribution [24].
As already explained, the lag-1 autocorrelation coefficient
for the I, P and B frames of each trace is very high in all the
studied cases. Therefore, we proceeded to build a DAR(1)
model for each video frame type for each one of the eighteen
traces under study. More specifically, in our model the rows of
the Q matrix consist of the Pearson type V probabilities (f0, f1,
… fk, FK), where FK= ?k>K fk, and K is the peak rate. Each k, for
k<K, corresponds to possible source rates less than the peak
rate of K.
From the transition matrix in (2) it is evident that if the
current frame has, for example, i cells, then the next frame will
have i cells with probability ?+(1-?)*fi, and will have k cells,
k≠ i, with probability (1-?)*fk. Therefore the number of cells
per video frame stays constant from one (I, P or B) video frame
to the next (I, P or B) video frame, respectively, in our model
THE DAR (1) MODEL – RESULTS AND DISCUSSION
(1)
(2)
(3)
with a probability slightly larger than ?. This is evident in Fig.
4, where we compare the actual B frames sequence of the NBC
News ([CIF, G16, B15, F28]) trace and their respective DAR(1)
model and it is shown that the DAR(1) model’s data produce a
“pseudo-trace” with a periodically constant number of cells for
a number of video frames. This causes a significant difference
when comparing a segment of the sequence of I, P, or B frames
of the actual NBC News video trace and a sequence of the same
length produced by our DAR(1) model. The same vast
differences also appeared when we plotted the DAR(1) models
versus the actual I, P and B video frames of the other traces
under study.
0
100
200
300
400
050010001500
Frames
200025003000
Cells
Actual TraceDAR Model
Figure 4. Comparison for a single trace between a 10000 frame sequence of
the actual B frames sequence of the NBC News ([CIF, G16, B15, F28]) trace
and the respective DAR(1) model in number of cells/frame (Y-axis).
0
2000
4000
6000
8000
10000
050010001500
Frames
200025003000
C ells
Actual TraceDAR Model
Figure 5. Comparison for 30 superposed sources between a 3000 I frame
sequence of the actual NBC News ([CIF, G16, B1, F28)] trace and the
respective DAR(1) model in number of cells/frame (Y-axis).
0
500
1000
1500
2000
2500
3000
3500
010002000300040005000
Frames
600070008000900010000
Cells
Actual TraceDAR Model
Figure 6. Comparison for 30 superposed sources between a 10000 P frame
sequence of the actual NBC News ([CIF, G16, B1, F28)] trace and the
respective DAR(1) model in number of cells/frame (Y-axis).
0
200
400
600
800
1000
1200
010002000300040005000
Frames
600070008000900010000
C ells
Actual TraceDAR Model
Figure 7. Comparison for 30 superposed sources between a 10000 B frame
sequence of the actual NBC News ([CIF, G16, B1, F28)] trace and the
respective DAR(1) model in number of cells/frame (Y-axis).
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
978-1-4244-2324-8/08/$25.00 © 2008 IEEE.
Page 5
However, our results have shown that the differences
presented above become small for all types of video frames and
for all the examined traces for a superposition of 5 or more
sources, and are almost completely smoothed out in most cases,
as the number of sources increases (the authors in [3] have
reached similar conclusions for their own DAR(1) model and
they present results for a superposition of 20 traces). This is
clear in Figs. 5-7, which present the comparison between our
DAR(1) model and the actual I, P, B frames’ sequences of the
NBC News ([CIF, G16, B1, F28)] , for a superposition of 30
traces (the results were perfectly similar for all video frame
types of the other three traces; we have used the initial trace
sequences to generate traffic for 30 sources, by using different
starting points in the trace). The common property of all these
results (derived by using a queue to model multiplexing and
processing frames in a FIFO manner) is that the DAR(1) model
seems to provide very accurate fitting results for P and B
frames, and relatively accurate for I frames.
However, although Figs. 5-7 suggest that the DAR(1)
model captures very well the behavior of the multiplexed actual
traces, they do not suffice as a result. Therefore, we proceeded
again with testing our model statistically in order to study
whether it produces a good fit for the I, P, B frames for the
trace superposition. For this reason we have used again Q-Q
plots, and we present indicatively some of these results in Figs.
8-9, where we have plotted the 0.01-, 0.02-, 0.03-,… quantiles
of the actual B and I video frames’ types of the NBC News
trace versus the respective quantiles of the respective DAR(1)
models, for a superposition of 30 traces.
As shown in Fig. 8, which presents the comparison of actual
P frames with the respective DAR(1) models for the NBC
News ([CIF, G16, B3, F48]) trace, the points of the Q-Q plot
fall almost completely along the 45-degree reference line, with
the exception of the first and last 3% quantiles (left- and right-
hand tail), for which the DAR(1) model underestimates the
probability of frames with a very small and very large,
respectively, number of cells. The very good fit shows that the
superposition of the P frames of the actual traces can be
modeled very well by a respective superposition of data
produced by the DAR(1) model (similar results were derived
for the superposition of B frames), as it was suggested in Figs.
6, 7. Fig. 9 presents the comparison of actual I frames with the
respective DAR(1) model, for the NBC News ([CIF, G16, B7,
F48]) trace. Again, the result suggested from Fig. 5, i.e., that
our method for modeling I frames of multiplexed H.264
videoconference streams provides only relative accuracy, is
shown to be valid with the use of the Q-Q plots. The results for
all the other cases which are not presented in Figs. 8-9 are
similar in nature to the ones shown in the Figures.
One problem which could arise with the use of DAR(1)
models is that such models take into account only short range
dependence, while, as shown earlier, H.264 videoconference
streams show LRD. This problem is overcome by our choice of
modeling I, P and B frames separately. This is shown in Fig.
10. It is clear from the Figure that, even for a small number of
lags, (e.g., larger than 10) the autocorrelation of the
superposition of frames decreases quickly, for all the traces.
Therefore, although in some cases the DAR(1) model exhibits a
slower decrease than that of the actual traces’ video frames
sequence, this has minimal impact on the fitting quality of the
DAR(1) model. This result further supports our choice of using
a first-order model.
IV.CONCLUSIONS
In this paper, we have proposed and tested a new model for
traffic originating from VBR H.264 videoconferencing
sources. Models of video traffic will prove very important in
the immediate future, as networks will need to competently
handle video traffic (i.e., to guarantee its strict QoS
requirements despite its
Autoregressive model built in this work is shown to be highly
accurate and, to the best of our knowledge, is one of the first
works in the relevant literature to address the specific problem.
Based on the very good results of our study in modeling P- and
burstiness). The Discrete
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
80010001200140016001800
Figure 8. Q-Q plot of the DAR(1) model versus the actual video for the P
frames of NBC News ([CIF, G16, B3, F48]), for 30 superposed sources.
4500
4700
4900
5100
5300
5500
5700
5900
6100
6300
6500
4500500055006000 6500
Figure 9. Q-Q plot of the DAR(1) model versus the actual video for the I
frames of NBC News ([CIF, G16, B7, F48]), for 30 superposed sources.
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
050100150200
Lag[frames]
ACC
Actual Trace
DAR Model
Figure 10. Autocorrelation vs. number of lags for the I frames of the actual
NBC News ([CIF, G16, B15, F28]) trace and the DAR(1) model, for 30
superposed sources.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.
978-1-4244-2324-8/08/$25.00 © 2008 IEEE.