Conference PaperPDF Available

A Framework For Generation Of Testsets For Recent Multimedia Workflows

Authors:

Abstract and Figures

Our framework offers solution approaches for that inadequacy to be overcome. An abstract description define each test case, its transformation to the designated target platforms as well as the operations and parameters to be processed within the evaluation in such a way that it is independent of any platform. The control of our automated framework workflow is based on Python and Apache Ant which trigger the execution of the described definitions with the result that different tools can be used flexible and purpose-dependent. We conduct an visual error detection evaluation of FFmpeg, Telestream Episode and Adobe Media Encoder. This consists the creation of single uncompressed images based on the definitions of the test patterns in POV-Ray. After that, they are merged together to video samples which form the platform dependent instances of the test cases. All of these videos are processed with different codecs and encoding qualities during the evaluation. The results are compared with its uncompressed raw material or other test cases. The evaluation shows that the identical test case video file results in visual strongly different outcomes after the encoding. Furthermore some created test cases cause complete losses of the raw information data, ringing artefacts at contrast edges and flicker effects.
Content may be subject to copyright.
A Framework For Generation Of Testsets
For Recent Multimedia Workflows
Robert Manthey, Steve Conrad, and Marc Ritter
Technische Universit¨at Chemnitz, Department of Computer Science,
Junior Professorship Media Computing, Straße der Nationen 62, 09111 Chemnitz,
Germany
{robert.manthey,steve.conrad,marc.ritter}@informatik.tu-chemnitz.de
http://www.tu-chemnitz.de/informatik/mc
Abstract. Our framework offers solution approaches for that inade-
quacy to be overcome. An abstract description define each test case,
its transformation to the designated target platforms as well as the op-
erations and parameters to be processed within the evaluation in such
a way that it is independent of any platform. The control of that auto-
mated workflows of our framework is based on Python and Apache Ant
which trigger the execution of the described definitions with the result
that different tools can be used flexible and purpose-dependent.
As demonstrator we conduct an visual error detection evaluation of FFm-
peg, Telestream Episode and Adobe Media Encoder. This consists of the
definitions of the test case sequences with POV-Ray and the subsequent
creation of single uncompressed images based on that definitions. After
that, they are merged together to uncompressed video samples which
form the platform dependent instances of the test cases. All of these
videos are processed with different codecs and encoding qualities dur-
ing the evaluation. The results are compared with its uncompressed raw
material or other test cases.
This shows that the identical test case video file results in visual strongly
different outcomes after the encoding. Furthermore some created test
cases cause complete losses of the raw information data, ringing artefacts
at contrast edges and flicker effects.
Keywords: Framework, Multimedia, Quality Analysis, Testing
1 Introduction
Today, a massive amount of video and multimedia data is processed. Cameras
observe systems in manufacturing, food production and car traffic. They provide
information in autonomous driving cars and advanced driver assistance systems
as well as entertainment systems. In a similar way audio and further data are
used and sometimes combined to one file or a group of files to make multimedia
data. The amount of that data grows as rapid as the their complexity. The
resolution increases to HD and more, they get 5.1 to 22.2 surround sound, as
well as 3D or 360-degree. The field of application expands form TV and computer
2 A Framework For Generation Of Testsets For Recent Multimedia Workflows
screens to huge projectors and small smartwatch like devices. But commonly the
examination of accessibility, correctness, performance and especially quality will
be done with old, small size single media samples like 1a1, 1b2and 1c3from last
century, reaching SD with stereo sound at most.
(a) RCA Indian-head test
image
(b) Lena test image (c) Frame of the Flower test
video
Fig. 1: Commonly used test images and test video
In principle thereby different steps of a processing chain (Fig. 2) are pro-
cessed, in order to improve the data, to store them or to show them. Each step
has thereby its own characteristics and adds errors, which can be noticed e.g.
as picture artefacts 3. The type of artifacts and their frequency of occurrence
are heavily addicted to numerous parameters like the transcoding system, its
implementation and settings as well as the input data. Different test patterns
exists for different types of image artifacts and due to the innumerable amount
of artifacts, they should prompt as many as possible artifact types and make
them detectable.
Fig. 2: Processing chain for images ”C, input from camera; G, grab image (digi-
tize and store); P, preprocess; R, recognize (i, image data; a, abstract data).”[1]
Many artefacts are nevertheless not detectable, since they will not appear in
a single test pattern. They are results of movements, quick image switching or
other conditional image transformations which appears in image sequences like
videos.
1http://sipi.usc.edu/database/download.php?vol=misc&img=4.2.04
2http://www.forensicgenealogy.info/contest 206 results.html
3http://media.xiph.org/video/derf/y4m/flower cif.y4m
A Framework For Generation Of Testsets For Recent Multimedia Workflows 3
(a) Ringing artefact (b) Blocking artefact[4]
Fig. 3: Common artefacts in digital images
Testsets should not only cause the expected errors, but also make them clearly
visible and detectable. In case of single images or nature movies it is e.g. difficult
to see slight color differences or single pixel errors. Furthermore in some areas
like image understanding [3], image retrieval or digital archiving [2] the testsets
have to be as compact as possible since otherwise extensive tests would hardly
or not at all be possible with such an amount of data. Facing these problems
we conceptualized a highly flexible synthetic testset, which was adapted to these
specific purposes.
2 Framework structure
To generate a synthetic, versatile and flexible testset which is able to detect
designated picture artefacts, it is necessary to use a highly adaptable framework
from first to last. As the fundament we need a vectorized description of the
different test patterns like the Scene Description Language (SDL) used in the
open source raytracing software POV-Ray. This abstract scene description is
based on parameters and coordinates whereby it is independent from the desired
resolution, aspect ratio and file format. Due to this it is easily possible to change
the testset or add further test cases in order to adapt the test pattern to changed
purposes. In the next step we defined the test cases through the scene description
parameters within a Python script which generates the workflow control file.
This control file uses Apache Ant to call POV-Ray and to pass the parameters
to it. The file is constructed in such a way that all, a selective amount and even
single test cases can be created. These test cases serve as the input data for
the programs to test and as the original material for the comparison with the
transcoded results.
4 A Framework For Generation Of Testsets For Recent Multimedia Workflows
Fig. 4: Schema of the framework to generate the testsets
3 Testset Generation
We used the description language of the raytracing renderer POV-Ray to define
a set of test pattern in a abstract, target and size independent way as shown
in Fig. 5a. At the same time a set of descriptions is defined with the Python
programming language to form the test sequences (Fig. 5b) as well as the way
to handle there execution through the planned programm. This forms an Ant-
based control4file to allow parallel as well as independent execution of each test
(Fig. 5c).
The test patterns can be divided in four groups of pattern designs. The first
pattern design is a cartesian grid structure of square blocks, which are provided
in edge lengths of 1, 4, 5, 8 and 10 pixels. Except for the 1-pixel-design, the
pattern sequences also exist in a rotating version. The images of the second
pattern design are composed of 1, 2 or 4 rectangular sections. Additionally the
2-section-sequence are translated perpendicular to their separation line. The 4-
section-sequences are also rotating. The third group contains images with stripes
in widths of 1, 4, 5, 8 and 10 pixels, which also are rotated and translated in
various ways. The last category of test patterns shows a siemens star in different
4https://ant.apache.org/
A Framework For Generation Of Testsets For Recent Multimedia Workflows 5
(a) Generic POV-Ray code to generate the
siemens star
(b) Definition of a testcase showing
a siemens star without rotation
(c) Definition of the execution of a test-
case with 400 kBit and 1.5 MBit with
FFmpeg
Fig. 5: Sample of the control element to generation a testcase with rotation and
its execution
sizes and with a different number of beams. They are available in a rotating
version, too.
Every test pattern sequence exists in four different resolutions, two commonly
used and two unusual ones. On the one hand 1920 ×1080 as a high frequently
used resolution for videos and movies as well as for displays and video projectors.
On the other hand 1024 ×768 as an old-standard but still used resolution e.g.
in smaller displays, netbooks and mobile devices. Besides we constructed two
resolutions out of prime numbers: 1009×631 with an usual aspect ratio of about
16:10 and 997×13 with an uncommon aspect ratio of about 77:1. Moreover each
test pattern were generated in five different color sets: A grayscale set, a color
set consisting the six complementary colors green, yellow, red, blue, cyan and
magenta, a set whereby all 16,777,216 colors of the RGB color set are randomly
changing and two green color sets. These permutations finally result in over 900
test cases with different structures, transformations, colors and resolutions.
6 A Framework For Generation Of Testsets For Recent Multimedia Workflows
Every test sequence is made of 360 images of every of these test cases whereby
the appropriated transformations and colors change from frame to frame. These
360 frames are combined and rendered into different video formats and qualities.
We used FFmpeg 2015, Telestream Episode 6.4.6 and Adobe Media Encoder
CC 2015.0.1(7.2) to transcode the test sequences into H.264/x264 and MPEG-2
videos with six different video bit rates. Refresh rate and resolution were not
changed during the transcoding process.
MPEG-2 serves as video format e.g. for DVDs and digital TV broadcast.
H.264 is employed on Blu-ray Discs, in the digital satellite TV broadcast stan-
dard DVB-S2 as well as in internet movies and in MP4 files for mobile devices.
Table 1: Usability of image sequences as direct input data
Encoder Image sequences usable
FFmpeg X
Episode Engine ×
Adobe Media Encoder ×
4 Experimental Results and Discussion
The generated test sequences are processed by the video encoders and empiri-
cally examined for remarkable events. The results show that huge single colored
structures as well as fence-like vertical or horizontal structures are good encod-
able. In contrast to that the strip-like content as well as the siemens star pattern
will create clearly visible artefacts as shown in Fig. 6 and Fig. 7. Some are
disturbing like in Fig. 6c whereas others impaired the whole image as Fig. 6b.
5 Future work
In this paper we proposed a new framework for generation of testset for multi-
media systems. Since the description of the tests is separated, the applicability
of the framework appears very flexible in creating arbitrary testset and there
execution in different environments. We show that the generated testsets can be
usable to search for badly influencing effects of performance and quality.
The next steps incorperate further more complex test pattern, composite
sequences to address more artefacts. Testpattern with sound as well as 3D and
embeded metadata are to be added. An automatic preliminary investigation
of the results could be used to find candidates of problematic testcases. An
application to other fields of image processing like robustness research in the
field of pedestrian detection can be possible.
A Framework For Generation Of Testsets For Recent Multimedia Workflows 7
(a) Original strip pattern
(b) Pattern dissolution with spots (c) Substantial spots in the pattern
Fig. 6: The strip sample 6a rotating around the picture center leads with FFm-
peg, H.264 and 1 MBit to the results in 6b and 6c. It shows up different unevenly
arising spots in varying intensity and with complete dissolution of the original
pattern in the first.
8 A Framework For Generation Of Testsets For Recent Multimedia Workflows
(a) Original siemens star with cutout
marking
(b) Central part of the
result frame 1
(c) Central part of the
result frame 7
(d) Central part of the
result frame 15
Fig. 7: The motionless siemens sample 7a leads with Episode Engine, MPEG-
2 and 1.5 MBit to the results in 7b, 7c and 7d. After the substantial artefact
formation in first frame, a frame-wise quality improvement followed and leads to
7c. The next frame 7d show substantial artifacts again and starts a new sequence
of improvementd with relative good quality conditions in the end.
A Framework For Generation Of Testsets For Recent Multimedia Workflows 9
Acknowledgments. This work was partially accomplished within the project
localizeIT (funding code 03IPT608X) funded by the Federal Ministry of Edu-
cation and Research (Bundesministerium f¨ur Wissenschaft und Forschung, Ger-
many) in the program of Entrepreneurial Regions InnoProfile-Transfer.
References
1. Davies, E.: Machine Vision. Morgan Kaufmann (2005)
2. Manthey, R., Herms, R., Ritter, M., Storz, M., Eibl, M.: Human Interface and the
Management of Information. Information and Interaction for Learning, Culture,
Collaboration and Business,: 15th International Conference, HCI International 2013,
Las Vegas, NV, USA, July 21-26, 2013, Proceedings, Part III, chap. A Support
Framework for Automated Video and Multimedia Workflows for Production and
Archive, pp. 336–341. Springer Berlin Heidelberg, Berlin, Heidelberg (2013), http:
//dx.doi.org/10.1007/978-3-642-39226-9_37
3. Ritter, M.: Optimization of algorithms for video analysis: A framework to fit the de-
mands of local television stations. In: Eibl, M. (ed.) Wissenschaftliche Schriftenreihe
Dissertationen der Medieninformatik, vol. 3, pp. i–xlii,1–336. Universit¨atsverlag der
Technischen Universit¨at Chemnitz, Germany (2014), http://nbn-resolving.de/
urn:nbn:de:bsz:ch1-qucosa-133517
4. Wiegand, T., Sullivan, G.J., Bjntegaard, G., Luthra, A.: Overview of the h.264/avc
video coding standard. IEEE Transactions on Circuits and Systems for Video Tech-
nology 13(7), 560–576 (July 2003)
... The characteristics of the individual artifacts and their frequency depend strongly on parameters such as resolution, frame rate and color space, but also on the transcoding system implementation, its configuration and the underlying data itself. [1] To cut down the artifacts and improve the quality of multimedia data, there are several test patterns. Any of the pattern is able to detect at least one type of artifact, even if the total amount of artifacts is uncountable. ...
... The contents are similar, but not the same, and the comparability of the results is thus limited. Therefore a new property-independent test set must be created as shown in [1]. Therefore, we have developed a system that is highly flexible in order to create synthetic test sets in the most independent way possible to handle this problem. ...
... On the other hand, many changes of fundamental properties like the aspect ratio, affects the effectiveness of the test and therefore a new test must be created. Manthey et al. [3] develops a highly flexible system to create synthetic testsets as independent as possible to overcome that problem and show its use with a short evaluation of visual data with the commonly used In the field of virtual reality systems the quality and the properties can be different for each of both eyes as shown in Fig. 6. Consequently, the amount of examinations increases at least by a factor of two and represents an additional constraint to the testset. ...
... Section 3 describes the exploratory comparison of the virtual reality devices with our testset and section 4 present the results. A brief summary and an outlook into future work is given in section 5. To generate testsets that are able to cover the given constraints in a flexible and adaptable way, we decided to describe them in an abstract, vectorized and deviceindependent form following the experience from Manthey et al. [3]. Each element of a testcase is defined by the shape of the structure, the color, the position and properties of the movement as shown in Fig. 9, aside of affine transformations like translation, rotation, scaling, shearing and reflection of the base elements. ...
Conference Paper
Full-text available
Nowadays, several different devices exist to offer virtual, augmented and mixed reality to show artificial objects. Measurements of the quality or the correctness of their resulting visual structures are not developed as sophisticated as in the classical areas of 2D image and video processing. Common testsets for image and video processing frequently contain sequences from the real world to reproduce their intrinsic characteristics and properties as well as artificial structures to provoke potential visual errors (see Fig. 1a). These common but traditional testsets are nowadays faced with rapid technical developments and changes like HD, UHD etc. improved surround sound or multiple data streams. That results in a limitation of the testsets usability and their ability to evoke visual errors. To overcome those limitations, we developed a system to create device-independent testsets to be used in the area of virtual reality devices and 3D environments. We conduct an empirical evaluation of most recent virtual reality devices like HTC Vive and Zeiss Cinemizer OLED, aiming to explore whether the technical hardware properties of the devices or the provided software interfaces may introduce errors in the visual representation. The devices are going to be evaluated by a group with technical skills and mostly advanced knowledge in computer graphics. All perceived visual and technical saliences are recorded in order to evaluate the correctness and the quality of the devices and the constraints.
... This approximates the real world and prevents the introduction of artifacts as resampling might do. [8] During the capturing process, a given scene definition and the designated mesh being used to produce the corresponding triangle-based image as well as a rectanglebased image for comparison at the same moment, as shown in Fig. 4. ...
Chapter
The process of capturing of an image is realized by a two-dimensional plane composed of photosensitive elements of almost entirely rectangle shape in technical solutions. However, biological visual systems use almost entirely hexagonal shapes and theoretical research shows the advantages of them but also the lack of usable capturing devices. We address this problem and create a virtual multi-grid-camera to overcome the problem and make further research possible. We create some scenes with common known content to demonstrate the use and show some effects being the result of the different shapes.
... However, synthesis the data as result of well-known definitions removes the need of annotations as well as undesired influences of the real world environment. In addition, the creation of scenarios showing dangerous situations or a huge amount of different facets of the same scene may become feasible [4]. ...
Chapter
Many systems use visual devices to detect, inspect and analyze persons, scenes, and properties of objects. Often, they use samples to learn relevant indicators to reach a high level of quality of the appropriated operation. Nevertheless, collecting samples and annotate the relevant parts may be a hard, expensive and error prone task in same fields of use. To overcome this problem we create a system to generate synthetic scenarios based on predefined and exact definitions of the content as well as the sample production process. To demonstrate the usability we apply a scenario with a humanoid with known activity and with various environment objects to different systems for visual detection and analysis.
Chapter
Many modern systems use image understanding components to inspect, observe and react. Often, the training is realized and limited to manually annotated real world data, but dangerous or resource expensive scenarios are rare. We create a solution to overcome these limitations and reduce the manual annotation process by producing synthetic scenarios of arbitrary content and composition.
Conference Paper
Full-text available
Nowadays, several computer devices are used to visually detect objects, people and activities. Their quality and performance depends on limited datasets created and annotated by error-prone and expensive human handwork. But to reach high quality for complex detection tasks extensive datasets with errorless annotations are needed. To overcome this dilemma we create a system for automatic generation of synthetic ground truth data to allow learning of complex detection tasks as well as testing, verification and evaluation.
Conference Paper
Full-text available
The management of the massive amount of data in video- and multimedia workflows is a hard and expensive work that requires much personnel and technical resources. Our flexible and scalable open source middleware framework offers solution approaches for the automated handling of the ingest and the workflow by an automated acquisition of all available information. By using an XML format to describe the processes, we provide an easy, fast and well-priced solution without the need for specific human skills.
Article
This article provides a description of the structure, technology, performance, and resources of H.264/AVC, which is referred to formally as ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4 Part 10)
Article
H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.
Human Interface and the Management of Information Information and Interaction for Learning, Culture, Collaboration and Business
  • R Manthey
  • R Herms
  • M Ritter
  • M Storz
  • M Eibl
Manthey, R., Herms, R., Ritter, M., Storz, M., Eibl, M.: Human Interface and the Management of Information. Information and Interaction for Learning, Culture, Collaboration and Business,: 15th International Conference, HCI International 2013, Las Vegas, NV, USA, July 21-26, 2013, Proceedings, Part III, chap. A Support Framework for Automated Video and Multimedia Workflows for Production and Archive, pp. 336–341. Springer Berlin Heidelberg, Berlin, Heidelberg (2013), http: //dx.doi.org/10.1007/978-3-642-39226-9_37
Optimization of algorithms for video analysis: a framework to fit the demands of local television stations
  • M Ritter
Ritter, M.: Optimization of algorithms for video analysis: A framework to fit the demands of local television stations. In: Eibl, M. (ed.) Wissenschaftliche Schriftenreihe Dissertationen der Medieninformatik, vol. 3, pp. i-xlii,1-336. Universitätsverlag der Technischen Universität Chemnitz, Germany (2014), http://nbn-resolving.de/ urn:nbn:de:bsz:ch1-qucosa-133517