Yao Wang

Polytechnic Institute of New York University, Brooklyn, New York, United States

Are you Yao Wang?

Claim your profile

Publications (141)112.16 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: In networked video applications, the frame rate (FR) and quantization stepsize (QS) of a compressed video are often adapted in response to the changes of the available bandwidth. It is important to understand how do the variation of FR and QS and their variation pattern affect the video quality. In this paper, we investigate the impact of temporal variation of FR and QS on the perceptual video quality. Among all possible variation patterns, we focus on videos in which two FR's (or QS's) alternate over a fixed interval. We explore the human responses to such variation by conducting subjective evaluation of test videos with different variation magnitudes and frequencies. We further analyze statistical significance of the impact of variation magnitude, variation frequency, video content, and their interactions. By analyzing the subjective ratings, we propose two models for predicting the quality of video with alternating FR and QS, respectively, The proposed models have simple mathematical forms with a few content-dependent parameters. The models fit the measured data very well using parameters determined by least square fitting with the measured data. We further propose some guidelines for adaptation of FR and QS based on trends observed from subjective test results.
    06/2014;
  • Source
    Yen-Fu Ou, Yuanyi Xue, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the impact of spatial, temporal, and amplitude resolution on the perceptual quality of a compressed video. Subjective quality tests were carried out on a mobile device and a total of 189 processed video sequences with 10 source sequences included in the test. Subjective data reveal that the impact of spatial resolution (SR), temporal resolution (TR), and quantization stepsize (QS) can each be captured by a function with a single content-dependent parameter, which indicates the decay rate of the quality with each resolution factor. The joint impact of SR, TR, and QS can be accurately modeled by the product of these three functions with only three parameters. The impact of SR and QS on the quality are independent of that of TR, but there are significant interactions between SR and QS. Furthermore, the model parameters can be predicted accurately from a few content features derived from the original video. The proposed model correlates well with the subjective ratings with a Pearson correlation coefficient of 0.985 when the model parameters are predicted from content features. The quality model is further validated on six other subjective rating data sets with very high accuracy and outperforms several well-known quality models.
    IEEE Transactions on Image Processing 06/2014; 23(6):2473-86. · 3.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a cooperative multicast scheme that uses Randomized Distributed Space Time Codes (R-DSTC), along with packet level Forward Error Correction (FEC), is studied. Instead of sending source packets and/or parity packets through two hops using R-DSTC as proposed in our prior work, the new scheme delivers both source packets and parity packets using only one hop. After the source station (access point, AP) first sends all the source packets, the AP as well as all nodes that have received all source packets together send the parity packets using R-DSTC. As more parity packets are transmitted, more nodes can recover all source packets and join the parity packet transmission. The process continues until all nodes acknowledge the receipt of enough packets for recovering the source packets. For each given node distribution, the optimum transmission rates for source and parity packets are determined such that the video rate that can be sustained at all nodes is maximized. This new scheme can support significantly higher video rates, and correspondingly higher PSNR of decoded video, than the prior approaches. Three suboptimal approaches, which do not require full information about user distribution or the feedback, and hence are more feasible in practice are also presented. The proposed suboptimal scheme with only the node count information and without feedback still outperforms our prior approach that assumes full channel information and no feedback.
    01/2014;
  • Source
    Zhan Ma, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Rate-control (RC) algorithm is highly desirable for networked video applications. Almost all existing RC methods are only adapting the quantization stepsize (QS) to meet the target bit rate at fixed video frame size (FS) and frame rate (FR) using the rate-quantization (R-Q) model. Recent mobile video applications demand more advanced rate adaptation with different FS, FR and QS, rather merely quantization adjustment, to meet rapid wireless network bandwidth switch. Towards this goal, it requires an accurate rate model with respect to the FS, FR and QS. Hence, we investigate the impacts of spatial, temporal and amplitude resolution (STAR) on the bit rate of a compressed video. We propose a rate model as the product of power functions of the FS, FR and QS, respectively. The proposed rate model is analytically tractable, requiring only four content dependent parameters. The same model works for different coding scenarios (including scalable and non-scalable video, temporal prediction using either hierarchical B or IPPP structure, etc.) with very high accuracy using both H.264/AVC and HEVC. Using the proposed rate model and a quality model, we show how to optimize the STAR for a given rate constraint, which is important for both encoder rate control and network video adaptation.
    IEEE ICME; 07/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a cooperative multicast scheme that uses Randomized Distributed Space Time Codes (R-DSTC), along with packet level Forward Error Correction (FEC), is studied. Instead of sending source packets and parity packets through two hops using R-DSTC as proposed in our prior work, the new scheme delivers both source packets and parity packets using only one hop. The source station (access point, AP) first sends all the source packets, then the source as well as all nodes that have received all source packets together send the parity packets using R-DSTC. As more parity packets are transmitted, more nodes can decode all source packets and join the parity packet transmission. The process continues until all nodes acknowledge (through feedback) the receipt of enough packets for recovering the source packets. For each given node distribution, the optimum transmission rates for source and parity packets, are determined such that the video rate that can be sustained at all nodes is maximized. This new scheme can support significantly higher video rates (and correspondingly higher PSNR of decoded video) than the prior approaches. We further present two suboptimal approaches, which do not require full information about user distribution and feedback, and hence are more feasible in practice. The new scheme using only the node count information and without feedback still outperforms our prior approach that assumes full channel information and no feedback, when the node density is sufficiently high.
    Proceedings of the 5th Workshop on Mobile Video; 02/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite growing maturity in broadband mobile networks, wireless video streaming remains a challenging task, especially in highly dynamic environments. Rapidly changing wireless link qualities, highly variable round trip delays, and unpredictable traffic contention patterns often hamper the performance of conventional end-to-end rate adaptation techniques such as TCP-friendly rate control (TFRC). Furthermore, existing approaches tend to treat all flows leaving the network edge equally, without accounting for heterogeneity in the underlying wireless link qualities or the different rate utilities of the video streams. In this paper, we present a proxy-based solution for adapting the scalable video streams at the edge of a wireless network, which can respond quickly to highly dynamic wireless links. Our design adopts the recently standardized scalable video coding (SVC) technique for lightweight rate adaptation at the edge. Leveraging previously developed rate and quality models of scalable video with both temporal and amplitude scalability, we derive the rate-quality model that relates the maximum quality under a given rate by choosing the optimal frame rate and quantization stepsize. The proxy iteratively allocates rates of different video streams to maximize a weighted sum of video qualities associated with different streams, based on the periodically observed link throughputs and the sending buffer status. The temporal and amplitude layers included in each video are determined to optimize the quality while satisfying the rate assignment. Simulation studies show that our scheme consistently outperforms TFRC in terms of agility to track link qualities and overall subjective quality of all streams. In addition, the proposed scheme supports differential services for different streams, and competes fairly with TCP flows.
    IEEE Transactions on Multimedia 01/2013; 15(7):1638-1652. · 1.75 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Video-conferencing has recently gained its momentum and is widely adopted by end-consumers. But there have been very few studies on the network impacts of video calls and the user Quality-of-Experience (QoE) under different network conditions. In this paper, we study the rate control and video quality of Skype video call, and analyze the network impacts in large-scale networks. We first measure the behaviors of Skype video call on a controlled network testbed. By varying packet loss rate, propagation delay and available network bandwidth, we observe how Skype adjusts its sending rate, FEC redundancy, video rate and frame rate. It is found that Skype is robust against mild packet losses and propagation delays, and can efficiently utilize the available network bandwidth. We also find that it employs an overly aggressive FEC protection strategy. Based on the measurement results, we develop rate control model, FEC model, and video quality model for Skype video calls. Extrapolating from the models, we conduct numerical analysis to study the network impacts. We demonstrate that user back-offs upon quality degradation serve as an effective user-level rate control scheme. We also show that Skype video calls are indeed TCP-friendly and respond to congestion quickly when the network is overloaded. Through a case study of a 4G wireless network, we demonstrate that the proposed models can be used in user-QoE-aware network provisioning.
    IEEE Transactions on Multimedia 01/2013; 15(6):1446-1457. · 1.75 Impact Factor
  • Meng Xu, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel one-pass mode decision algorithm for encoding multiple quality layers, following the coarse grain scalability coding structure in the H.264/SVC standard. In our solution, motion estimation (ME) is carried out only once at the base layer using the reconstructed picture from the highest enhancement layer. The same motion vectors are used for all layers to not only avoid multiple ME processes at different layers, but also save the overhead bits. In addition, early SKIP/DIRECT mode decision is introduced to further boost the encoding speed. The encoder produces fully compliant SVC bit streams. Although the method is applicable to both coarse grain scalability (CGS) and medium grain scalability (MGS), we have examined its performance over CGS only. We demonstrate that more than 2x speedup for three-layer coding against the conventional H.264/SVC encoding using the reference software over 7 test sequences. Significantly, this complexity saving is achieved simultaneously with increase in the coding efficiency! Although the base layer requires slightly higher bit rate (2.5% in terms of the BD-Rate), the enhancement layers enjoy lower rates (5.7% and 2.2% reduction for the total of two and three layers, respectively), on average of 7 test sequences.
    Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a general framework to fuse noisy point clouds from multiview images of the same object. We solve this classical vision problem using a newly emerging signal processing technique known as matrix completion. With this framework, we construct the initial incomplete matrix from the observed point clouds by all the cameras, with the invisible points by any camera denoted as unknown entries. The observed points corresponding to the same object point are put into the same row. When properly completed, the recovered matrix should have rank one, since all the columns describe the same object. Therefore, an intuitive approach to complete the matrix is by minimizing its rank subject to consistency with observed entries. In order to improve the fusion accuracy, we propose a general noisy matrix completion method called log-sum penalty completion (LPC), which is particularly effective in removing outliers. Based on the majorization–minimization algorithm (MM), the non-convex LPC problem is effectively solved by a sequence of convex optimizations. Experimental results on both point cloud fusion and MVS reconstructions verify the effectiveness of the proposed framework and the LPC algorithm.
    IEEE Journal of Selected Topics in Signal Processing 09/2012; 6(5):566-582. · 3.30 Impact Factor
  • Source
    Zhan Ma, Hao Hu, Meng Xu, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the impacts of spatial, temporal and amplitude resolution (STAR) on the bit rate of a compressed video. We propose an analytical rate model in terms of the quantization stepsize, frame size and frame rate. Experimental results reveal that the increase of the video rate as the individual resolution increases follows a power function. Hence, the proposed model expresses the rate as the product of power functions of the quantization stepsize, frame size and frame rate, respectively. The proposed rate model is analytically tractable, requiring only four content dependent parameters. We also propose methods for predicting the model parameters from content features that can be computed from original video. Simulation results show that model predicted rates fit the measured data very well with high Pearson correlation (PC) and small relative root mean square error (RRMSE). The same model function works for different coding scenarios (including scalable and non-scalable video, temporal prediction using either hierarchical B or IPPP structure, etc.) with very high accuracy (average PC $>$ 0.99), but the values of model parameters differ. Using the proposed rate model and the quality model introduced in a separate work, we show how to optimize the STAR for a given rate constraint, which is important for both encoder rate control and scalable video adaptation. Furthermore, we demonstrate how to order the spatial, temporal and amplitude layers of a scalable video in a rate-quality optimized way.
    06/2012;
  • Source
    Yuanyi Xue, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, the perceptual quality difference between scalable and single-layer videos coded at the same spatial, temporal and amplitude resolution (STAR) is investigated through a subjective test using a mobile platform. Three source videos are considered and for each source video single-layer and scalable video are compared at 9 different STARs. We utilize paired comparison methods with and without tie option. Results collected from 10 subjects in the without "tie" option and 6 subjects in the with "tie" option show that there is no significant quality difference between scalable and singlelayer video when coded at the same STAR. An analysis of variance (ANOVA) test is also performed to further confirm the finding.
    06/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Two-way real-time video communication in wireless networks requires high bandwidth, low delay and error resiliency. This paper addresses these demands by proposing a system with the integration of Network Coding (NC), user cooperation using Randomized Distributed Space-time Coding (R-DSTC) and packet level Forward Error Correction (FEC) under a one-way delay constraint. Simulation results show that the proposed scheme significantly outperforms both conventional direct transmission as well as R-DSTC based two-way cooperative transmission, and is most effective when the distance between the users is large.
    05/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Numerous applications in signal processing have benefited from the theory of compressed sensing which shows that it is possible to reconstruct signals sampled below the Nyquist rate when certain conditions are satisfied. One of these conditions is that there exists a known transform that represents the signal with a sufficiently small number of non-zero coefficients. However when the signal to be reconstructed is composed of moving images or volumes, it is challenging to form such regularization constraints with traditional transforms such as wavelets. In this paper, we present a motion compensating prior for such signals that is derived directly from the optical flow constraint and can utilize the motion information during compressed sensing reconstruction. Proposed regularization method can be used in a wide variety of applications involving compressed sensing and images or volumes of moving and deforming objects. It is also shown that it is possible to estimate the signal and the motion jointly or separately. Practical examples from magnetic resonance imaging has been presented to demonstrate the benefit of the proposed method.
    03/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Magnetic Resonance Imaging (MRI) is one of the fields that the compressed sensing theory is well utilized to reduce the scan time significantly leading to faster imaging or higher resolution images. It has been shown that a small fraction of the overall measurements are sufficient to reconstruct images with the combination of compressed sensing and parallel imaging. Various reconstruction algorithms has been proposed for compressed sensing, among which Augmented Lagrangian based methods have been shown to often perform better than others for many different applications. In this paper, we propose new Augmented Lagrangian based solutions to the compressed sensing reconstruction problem with analysis and synthesis prior formulations. We also propose a computational method which makes use of properties of the sampling pattern to significantly improve the speed of the reconstruction for the proposed algorithms in Cartesian sampled MRI. The proposed algorithms are shown to outperform earlier methods especially for the case of dynamic MRI for which the transfer function tends to be a very large matrix and significantly ill conditioned. It is also demonstrated that the proposed algorithm can be accelerated much further than other methods in case of a parallel implementation with graphics processing units (GPUs).
    03/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a proxy-based solution for adapting the scalable video streams at the edge of a wireless network, which can respond quickly to highly dynamic wireless links. Our design adopts the scalable video coding (SVC) technique for lightweight rate adaptation at the edge. We derive a QoE model, i.e., rate-quality tradeoff model, that relates the maximum subjective quality under a given rate by choosing the optimal frame rate and quantization stepsize. The proxy iteratively allocates rates of different video streams to maximize a weighted sum of video qualities associated with different streams, based on the periodically observed link throughputs and the sending buffer status. Simulation studies show that our scheme consistently outperforms TFRC in terms of agility to track link qualities and overall quality of all streams. In addition, the proposed scheme supports differential services for different streams, and competes fairly with TCP flows.
    Communications (ICC), 2012 IEEE International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we investigate the power-rate constrained scalable video adaptation for popular wireless video streaming application, where the wireless access network bandwidth and mobile remaining battery capacity are usually limited. Towards this goal, we have developed a scalable video decoding complexity model with the focus on the joint temporal and amplitude scalability, which can be translated to the power consumption model easily for mobile processor. Overall, there are three parameters for our proposed decoding complexity model. Currently, we propose to embed these parameters in the header field. We have validated our complexity model using various videos with different contents, resolutions, and bit rates. Results show that our proposed model can estimate the scalable video decoding complexity accurately with small root mean square error (RMSE) and high Pearson correlation (PC). Together with our rate and perceptual quality models for scalable video, we have made the power-rate constrained scalable video adaptation analytically tractable without requiring exhaustive search.
    Information Sciences and Systems (CISS), 2012 46th Annual Conference on; 01/2012
  • Yen-Fu Ou, Huiqi Zeng, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: This work investigates the impact of temporal variation of quantization stepsize (QS) on perceptual video quality. Among many dimensions of QS variation, as a first step we focus on videos in which two QS's, alternate over fixed intervals. We present subjective test results, and analyze the influence of several factors (including the QS difference, QS ratio, changing intervals, and video content). According the observation and data analysis, we propose analytical models that relate the perceived quality with the two QS's. Such quality assessment and modeling are essential in making video adaptation decisions when delivering video over dynamically changing wireless links.
    Image Processing (ICIP), 2012 19th IEEE International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Video telephony has recently gained its momentum and is widely adopted by end-consumers. But there have been very few studies on the network impacts of video calls and the user Quality-of-Experience (QoE) under different network conditions. In this paper, we study the rate control and video quality of Skype video calls. We first measure the behaviors of Skype video calls on a controlled network testbed. By varying packet loss rate, propagation delay and bandwidth, we observe how Skype adjusts its rates, FEC redundancy and video quality. We find that Skype is robust against mild packet losses and propagation delays, and can efficiently utilize the available network bandwidth. We also find that Skype employs an overly aggressive FEC protection strategy. Based on the measurement results, we develop rate control model, FEC model, and video quality model for Skype. Extrapolating from the models, we conduct numerical analysis to study the network impacts of Skype. We demonstrate that user back-offs upon quality degradation serve as an effective user-level rate control scheme. We also show that Skype video calls are indeed TCP-friendly and respond to congestion quickly when the network is overloaded.
    Proceedings - IEEE INFOCOM 01/2012;
  • Hao Hu, Zhan Ma, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper considers how to choose the frame size, frame rate, and quantization stepsize to optimize the perceptual quality for a given rate constraint. The proposed solution leverages previously developed quality and rate models that explicitly consider the impact of spatial, temporal, and amplitude resolution (STAR) on the quality and rate. Using these models we further propose algorithms for ordering the STAR layers to form a rate-quality optimized stream, which can greatly facilitate scalable video adaptation.
    Image Processing (ICIP), 2012 19th IEEE International Conference on; 01/2012
  • Source
    Zhan Ma, Meng Xu, Yen-Fu Ou, Yao Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper first investigates the impact of frame rate and quantization on the bit rate and perceptual quality of compressed video. We propose a rate model and a quality model, both in terms of the quantization stepsize and frame rate. Both models are expressed as the product of separate functions of quantization stepsize and frame rate. The proposed models are analytically tractable, each requiring only a few content-dependent parameters. The rate model is validated over videos coded using both scalable and nonscalable encoders, under a variety of encoder settings. The quality model is validated only for a scalable video, although it is expected to be applicable to a single-layer video as well. We further investigate how to predict the model parameters using the content features extracted from original videos. Results show accurate bit rate and quality prediction (average Pearson correlation ${>}{0.99}$) can be achieved with model parameters predicted using three features. Finally, we apply rate and quality models for rate-constrained scalable bitstream adaptation and frame rate adaptive rate control. Simulations show that our model-based solutions produce better video quality compared with conventional video adaptation and rate control.
    IEEE Transactions on Circuits and Systems for Video Technology 01/2012; 22(5):671-682. · 1.82 Impact Factor

Publication Stats

2k Citations
112.16 Total Impact Points

Institutions

  • 2009–2013
    • Polytechnic Institute of New York University
      • Department of Electrical and Computer Engineering
      Brooklyn, New York, United States
  • 2005–2011
    • CUNY Graduate Center
      New York City, New York, United States
  • 2007–2009
    • Tsinghua University
      • Department of Automation
      Beijing, Beijing Shi, China
  • 2001–2003
    • Mitsubishi Electric Research Laboratories
      Cambridge, Massachusetts, United States