Yiannis Andreopoulos

University College London, Londinium, England, United Kingdom

Are you Yiannis Andreopoulos?

Claim your profile

Publications (101)140.87 Total impact

  • Chong-Wah Ngo · Klaus Schoeffmann · Yiannis Andreopoulos · Christian Breiteneder
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimedia modeling aims to study computational models for addressing real-world multimedia problems from various perspectives, including information fusion, perceptual understanding, performance evaluation and social media. The topic becomes increasingly important with the massive amount of data available over the Internet, representing different pieces of information in heterogeneous forms that need to be consolidated before being used for multimedia problems. On the other hand, the advancement in technologies such as mobile and sensing devices drive the needs for revisiting the existing models for not only dealing with audio-visual cues but also incorporating various sensory modalities that have potential in providing cheaper and simpler solutions. The selected papers in this special issue were extended by their authors to a journal version and then went through a rigorous review process that included at least three anonymous referees.The first paper entitled “Multimedia Classificat ...
    No preview · Article · Jul 2013 · Multimedia Tools and Applications
  • Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: There is a growing realization that the expected fault rates and energy dissipation stemming from increases in CMOS integration will lead to the abandonment of traditional system reliability in favor of approaches that offer reliability to hardware-induced errors across the application, runtime support, architecture, device and integrated-circuit (IC) layers. Commercial stakeholders of multimedia stream processing (MSP) applications, such as information retrieval, stream mining systems, and high-throughput image and video processing systems already feel the strain of inadequate system-level scaling and robustness under the always-increasing user demand. While such applications can tolerate certain imprecision in their results, today's MSP systems do not support a systematic way to exploit this aspect for cross-layer system resilience. However, research is currently emerging that attempts to utilize the error-tolerant nature of MSP applications for this purpose. This is achieved by modifications to all layers of the system stack, from algorithms and software to the architecture and device layer, and even the IC digital logic synthesis itself. Unlike conventional processing that aims for worst-case performance and accuracy guarantees, error-tolerant MSP attempts to provide guarantees for the expected performance and accuracy. In this paper we review recent advances in this field from an MSP and a system (layer-by-layer) perspective, and attempt to foresee some of the components of future cross-layer error-tolerant system design that may influence the multimedia and the general computing landscape within the next ten years.
    No preview · Article · Feb 2013 · IEEE Transactions on Multimedia
  • Source
    Yiannis Andreopoulos · L.-G. Chen · Brian L. Evans · Hong Jiang · Rakesh Kumar
    [Show abstract] [Hide abstract]
    ABSTRACT: The five papers in this special section are devoted to the topic of new software and hardware programs and services used for error-tolerant multimedia applications.
    Preview · Article · Feb 2013 · IEEE Transactions on Multimedia
  • Source
    Mohammad Ashraful Anam · Paul N. Whatmough · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Generic matrix multiplication (GEMM) and one-dimensional discrete convolution/cross-correlation (CONV) kernels perform the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by decreasing the number of projections computed by each kernel, which in turn produces approximate results, i.e. lowers the precision of the performed computation. Existing realizations of error-tolerant multimedia applications can opt to utilize a small number of the input projections (typically just one) in order to save energy and processing cycles, while all error-intolerant systems can compute all input projections and obtain full-precision outputs. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition demonstrate that the proposed approach allows for 5-fold to 10-fold increase of processing throughput and more than 80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the expected recognition and matching precision.
    Full-text · Conference Paper · Jan 2013
  • I. Anarado · M.A. Anam · D. Anastasia · F. Verdicchio · Y. Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, relevance ranking and object recognition systems. Because of the prevalence of GEMM in these applications, ensuring its robustness to transient hardware faults is of paramount importance for highly-efficientlhighly-reliable systems. This is currently accomplished via error control coding (ECC) or via dual modular redundancy (DMR) approaches that produce a separate set of “parity” results to allow for fault detection in GEMM. We introduce a third family of methods for fault detection in integer matrix products based on the concept of numerical packing. The key difference of the new approach against ECC and DMR approaches is the production of redundant results within the numerical representation of the inputs rather than as a separate set of parity results. In this way, high reliability is ensured within integer matrix products while allowing for: (i) in-place storage; (ii) usage of any off-the-shelf 64-bit floating-point GEMM routine; (iii) computational overhead that is independent of the GEMM inner dimension. The only detriment against a conventional (i.e. fault-intolerant) integer matrix multiplication based on 32-bit floating-point GEMM is the sacrifice of approximately 30.6% of the bitwidth of the numerical representation. However, unlike ECC methods that can reliably detect only up to a few faults per GEMM computation (typically two), the proposed method attains more than “12 nines” reliability, i.e. it will only fail to detect 1 fault out of more than 1 trillion arbitrary faults in the GEMM operations. As such, it achieves reliability that approaches that of DMR, at a very small fraction of its cost. Specifically, a single-threaded software realization of our proposal on an Intel i7-3632QM 2.2GHz processor (Ivy Bridge architecture with AVX support) incurs, on average, only 19% increase of execution time agai- st an optimized, fault-intolerant, 32-bit GEMM routine over a range of matrix sizes and it remains more than 80% more efficient than a DMR-based GEMM.
    No preview · Conference Paper · Jan 2013
  • Source
    Mohammad Ashraful Anam · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Convolution and cross-correlation are the basis of filtering and pattern or template matching in multimedia signal processing. We propose two throughput scaling options for any one-dimensional convolution kernel in programmable processors by adjusting the imprecision (distortion) of computation. Our approach is based on scalar quantization, followed by two forms of tight packing in floating-point (one of which is proposed in this paper) that allow for concurrent calculation of multiple results. We illustrate how our approach can operate as an optional pre- and post-processing layer for off-the-shelf optimized convolution routines. This is useful for multimedia applications that are tolerant to processing imprecision and for cases where the input signals are inherently noisy (error tolerant multimedia applications). Indicative experimental results with a digital music matching system and an MPEG-7 audio descriptor system demonstrate that the proposed approach offers up to 175% increase in processing throughput against optimized (full-precision) convolution with virtually no effect in the accuracy of the results. Based on marginal statistics of the input data, it is also shown how the throughput and distortion can be adjusted per input block of samples under constraints on the signal-to-noise ratio against the full-precision convolution.
    Full-text · Article · Jun 2012 · IEEE Transactions on Multimedia
  • Source
    Dujdow Buranapanichkit · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: It is well known that biology-inspired self-maintaining algorithms in wireless sensor nodes achieve near optimum time division multiple access (TDMA) characteristics in a decentralized manner and with very low complexity. We extend such distributed TDMA approaches to multiple channels (frequencies). This is achieved by extending the concept of collaborative reactive listening in order to balance the number of nodes in all available channels. We prove the stability of the new protocol and estimate the delay until the balanced system state is reached. Our approach is benchmarked against single-channel distributed TDMA and channel hopping approaches using TinyOS imote2 wireless sensors.
    Preview · Article · May 2012 · IEEE Wireless Communication Letters
  • Source
    Davide Anastasia · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the imprecision (distortion) of computation. Our technique employs adaptive scalar companding and rounding to input matrix blocks followed by two forms of packing in floating-point that allow for concurrent calculation of multiple results. Since the adaptive companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding increase in distortion) depends on the input data statistics. To demonstrate this, we derive the optimal throughput-distortion control framework for GEMM for the broad class of zero-mean, independent identically distributed, input sources. Our approach converts matrix multiplication in programmable processors into a computation channel: when increasing the processing throughput, the output noise (error) increases due to (i) coarser quantization and (ii) computational errors caused by exceeding the machine-precision limitations. We show that, under certain distortion in the GEMM computation, the proposed framework can significantly surpass 100% of the peak performance of a given processor. The practical benefits of our proposal are shown in a face recognition system and a multi-layer perceptron system trained for metadata learning from a large music feature database.
    Preview · Article · Oct 2011 · IEEE Transactions on Signal Processing
  • Fabio Verdicchio · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimedia analysis, enhancement and coding methods often resort to adaptive transforms that exploit local characteristics of the input source. Following the signal decomposition stage, the produced transform coefficients and the adaptive transform parameters can be subject to quantization and/or data corruption (e.g. due to transmission or storage limitations). As a result, mismatches between the analysis- and synthesis-side transform coefficients and adaptive parameters may occur, severely impacting the reconstructed signal and therefore affecting the quality of the subsequent analysis, processing and display task. Hence, a thorough understanding of the quality degradation ensuing from such mismatches is essential for multimedia applications that rely on adaptive signal decompositions. This paper focuses on lifting-based adaptive transforms that represent a broad class of adaptive decompositions. By viewing the mismatches in the transform coefficients and the adaptive parameters as perturbations in the synthesis system, we derive analytic expressions for the expected reconstruction distortion. Our theoretical results are experimentally assessed using 1D adaptive decompositions and motion-adaptive temporal decompositions of video signals.
    No preview · Article · Oct 2011 · Image and Vision Computing
  • Fabio Verdicchio · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: In video communication systems, due to quantization and transmission errors, mismatches between the transmitter - and receiver-side information may occur, severely impacting the reconstructed video. Theoretical understanding of the quality degradation ensuing from such mismatches is essential when targeting quality-of-service for video communications. In this paper, by viewing the mismatches in the transform coefficients and the adaptive parameters of the temporal analysis of a video coding system as perturbations in the synthesis system, we derive analytic approximations for the expected reconstruction distortion. Our theoretical results are experimentally assessed using adaptive temporal decompositions within a video coding system based on motion-adaptive temporal lifting decomposition. Since we focus on the generic case of adaptive lifting transforms, our results can provide useful insights for estimation-theoretic resiliency mechanisms to be considered within standardized transform-based codecs.
    No preview · Conference Paper · Sep 2011
  • D. Anastasia · Y. Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: The generic matrix multiply (GEMM) subprogram is the core element of high-performance linear algebra software used in computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the precision of computation. Our technique employs DSP methods (such as scalar companding and rounding), followed by a new form of tight packing in floating-point that allows for concurrent calculation of multiple results. Since the companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding loss in precision) depends on the input data statistics: low-variance parts of the matrix multiplication are computed faster than high-variance parts and the error is controlled in a stochastic and not in a worst-case sense. This can convert high-performance numerical DSP libraries into a computation channel where the output error increases when higher throughput is requested. Potential DSP applications that can benefit from the proposed approach are highlighted.
    No preview · Conference Paper · Aug 2011
  • Yiannis Andreopoulos · Dai Jiang · Andreas Demosthenous
    [Show abstract] [Hide abstract]
    ABSTRACT: It was proposed recently that quantized representations of the input source (e.g., images, video) can be used for the computation of the two-dimensional discrete wavelet transform (2D DWT) incrementally. The coarsely quantized input source is used for the initial computation of the forward or inverse DWT, and the result is successively refined with each new refinement of the source description via an embedded quantizer. This computation is based on the direct two-dimensional factorization of the DWT using the generalized spatial combinative lifting algorithm. In this correspondence, we investigate the use of prediction for the computation of the results, i.e., exploiting the correlation of neighboring input samples (or transform coefficients) in order to reduce the dynamic range of the required computations, and thereby reduce the circuit activity required for the arithmetic operations of the forward or inverse transform. We focus on binomial factorizations of DWTs that include (amongst others) the popular 9/7 filter pair. Based on an FPGA arithmetic co-processor testbed, we present energy-consumption results for the arithmetic operations of incremental refinement and prediction-based incremental refinement in comparison to the conventional (nonrefinable) computation. Our tests with combinations of intra and error frames of video sequences show that the former can be 70% more energy efficient than the latter for computing to half precision and remains 15% more efficient for full-precision computation.
    No preview · Article · Sep 2010 · IEEE Transactions on Signal Processing
  • Source
    Davide Anastasia · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Computer hardware with native support for large-bitwidth operations can be used for the concurrent calculation of multiple independent linear image processing operations when these operations map integers to integers. This is achieved by packing multiple input samples in one large-bitwidth number, performing a single operation with that number and unpacking the results. We propose an operational framework for tight packing, i.e., achieve the maximum packing possible by a certain implementation. We validate our framework on floating-point units natively supported in mainstream programmable processors. For image processing tasks where operational tight packing leads to increased packing in comparison to previously-known operational packing, the processing throughput is increased by up to 25%.
    Preview · Article · May 2010 · IEEE Signal Processing Letters
  • Source
    Davide Anastasia · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Software realizations of computationally-demanding image processing tasks (e.g., image transforms and convolution) do not currently provide graceful degradation when their clock-cycles budgets are reduced, e.g., when delay deadlines are imposed in a multitasking environment to meet throughput requirements. This is an important obstacle in the quest for full utilization of modern programmable platforms' capabilities since worst-case considerations must be in place for reasonable quality of results. In this paper, we propose (and make available online) platform-independent software designs performing bitplane-based computation combined with an incremental packing framework in order to realize block transforms, 2-D convolution and frame-by-frame block matching. The proposed framework realizes incremental computation: progressive processing of input-source increments improves the output quality monotonically. Comparisons with the equivalent nonincremental software realization of each algorithm reveal that, for the same precision of the result, the proposed approach can lead to comparable or faster execution, while it can be arbitrarily terminated and provide the result up to the computed precision. Application examples with region-of-interest based incremental computation, task scheduling per frame, and energy-distortion scalability verify that our proposal provides significant performance scalability with graceful degradation.
    Preview · Article · Mar 2010 · IEEE Transactions on Image Processing
  • Source
    Davide Anastasia · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Ubiquitous image processing tasks (such as transform decompositions, filtering and motion estimation) do not currently provide graceful degradation when their clock-cycles budgets are reduced, e.g. when delay deadlines are imposed in a multi-tasking environment to meet throughput requirements. This is an important obstacle in the quest for full utilization of modern programmable platforms' capabilities, since: (i) worst-case considerations must be in place for reasonable quality of results; (ii) throughput-distortion tradeoffs are not possible for distortion-tolerant image processing applications without cumbersome (and potentially costly) system customization. In this paper, we extend the functionality of the recently-proposed software framework for operational refinement of image processing (ORIP) and demonstrate its inherent throughput-distortion and energy-distortion scalability. Importantly, our extensions allow for such scalabilities at the software level, without needing hardware-specific customization. Extensive tests on a mainstream notebook computer and on OLPC's subnotebook ("xo-laptop") verify that the proposed designs provide for: (i) seamless quality-complexity scalability per video frame; (ii) up to 60% increase in processing throughput with graceful degradation in output quality; (iii) up to 20% more images captured and filtered for the same power-level reduction on the xo-laptop.
    Preview · Conference Paper · Jan 2010
  • Source
    Davide Anastasia · Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose software designs that perform incremental computation with monotonic distortion reduction for two-dimensional convolution and frame-by-frame block-matching tasks. In order to reduce the run time of the proposed designs, we combine bitplane-based computation with a packing technique proposed recently. In the case of block matching, we also utilize previously-computed motion vectors to perform localized search when incrementing the precision of the input video frames. The applicability of the proposed approach is demonstrated by execution time measurements on the xo-laptop (ldquo100$ laptoprdquo) and on a mainstream laptop; our software is also made available online. In comparison to the conventional (non-incremental) software realization, the proposed approach leads to scalable computation per input frame while producing identical (or comparable) precision for the output results of each operating point. In addition, the execution of the proposed designs can be arbitrarily terminated for each frame with the output being available at the already-computed precision.
    Preview · Conference Paper · Nov 2009
  • Source
    Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: Typographical errors occurred in equations (2) and (3) of the above-named work. The correct forms of these equations is presented.
    Preview · Article · Oct 2009 · IEEE Transactions on Image Processing
  • P. Schelkens · Y. Andreopoulos · J. Barbarien

    No preview · Article · Sep 2009
  • Source
    Yiannis Andreopoulos
    [Show abstract] [Hide abstract]
    ABSTRACT: In their recent paper, (see ibid., vol.17, no.7, p.1061-8, 2008) Alnasser and Foroosh derive a wavelet-domain (in-band) method for phase-shifting of 2-D ldquononseparablerdquo Haar transform coefficients. Their approach is parametrical to the (a priori known) image translation. In this correspondence, we show that the utilized transform is in fact the separable Haar discrete wavelet transform (DWT). As such, wavelet-domain phase shifting can be performed using previously-proposed phase-shifting approaches that utilize the overcomplete DWT (ODWT), if the given image translation is mapped to the phase component and in-band position within the ODWT.
    Preview · Article · Sep 2009 · IEEE Transactions on Image Processing
  • Source
    Nikolaos Kontorinis · Yiannis Andreopoulos · Mihaela van der Schaar
    [Show abstract] [Hide abstract]
    ABSTRACT: Video decoding complexity modeling and prediction is an increasingly important issue for efficient resource utilization in a variety of applications, including task scheduling, receiver-driven complexity shaping, and adaptive dynamic voltage scaling. In this paper we present a novel view of this problem based on a statistical framework perspective. We explore the statistical structure (clustering) of the execution time required by each video decoder module (entropy decoding, motion compensation, etc.) in conjunction with complexity features that are easily extractable at encoding time (representing the properties of each module's input source data). For this purpose, we employ Gaussian mixture models (GMMs) and an expectation-maximization algorithm to estimate the joint execution-time-feature probability density function (PDF). A training set of typical video sequences is used for this purpose in an offline estimation process. The obtained GMM representation is used in conjunction with the complexity features of new video sequences to predict the execution time required for the decoding of these sequences. Several prediction approaches are discussed and compared. The potential mismatch between the training set and new video content is addressed by adaptive online joint-PDF re-estimation. An experimental comparison is performed to evaluate the different approaches and compare the proposed prediction scheme with related resource prediction schemes from the literature. The usefulness of the proposed complexity-prediction approaches is demonstrated in an application of rate-distortion-complexity optimized decoding.
    Preview · Article · Aug 2009 · IEEE Transactions on Circuits and Systems for Video Technology

Publication Stats

1k Citations
140.87 Total Impact Points

Institutions

  • 2009-2015
    • University College London
      • Department of Electronic and Electrical Engineering
      Londinium, England, United Kingdom
  • 2006-2009
    • University of California, Los Angeles
      • Department of Electrical Engineering
      Los Ángeles, California, United States
  • 2008
    • University of London
      Londinium, England, United Kingdom
  • 2007-2008
    • Queen Mary, University of London
      Londinium, England, United Kingdom
  • 2001-2005
    • Vrije Universiteit Brussel
      • Electronics and Informatics (ETRO)
      Bruxelles, Brussels Capital, Belgium
  • 2003
    • Philips
      Eindhoven, North Brabant, United States