Computer-aided diagnosis and simultaneous visualization based on independent component analysis and clustering are integrated in an intelligent system for the evaluation of small mammographic lesions in breast MRI. These techniques are tested on biomedical time-series representing breast MRI scans and enable the extraction of spatial and temporal features of dynamic MRI data stemming from patients with confirmed lesion diagnosis. By revealing regional properties of contrast-agent uptake characterized by subtle differences of signal amplitude and dynamics, these methods provide both a set of prototypical time-series and a corresponding set of cluster assignment maps which further provide a segmentation with regard to identification and regional subclassification of pathological breast tissue lesions. Both approaches lead to an increase of the diagnostic accuracy of MRI mammography by improving the sensitivity without reduction of specificity.
In this paper, we propose a variational framework which combines top-down and bottom-up information to address the challenge of partially occluded image segmentation. The algorithm applies shape priors and divides shape learning into shape mode clustering and non-rigid transformation estimation to handle intraclass and interclass coarse to fine variations. A semi-parametric density approximation using adaptive meanshift and L(2)E robust estimation is used to model the likelihood. A set of real images is used to show the good performance of the algorithm.
There is a growing concern about chronic diseases and other health problems related to diet including obesity and cancer. Dietary intake provides valuable insights for mounting intervention programs for prevention of chronic diseases. Measuring accurate dietary intake is considered to be an open research problem in the nutrition and health fields. In this paper, we describe a novel mobile telephone food record that provides a measure of daily food and nutrient intake. Our approach includes the use of image analysis tools for identification and quantification of food that is consumed at a meal. Images obtained before and after foods are eaten are used to estimate the amount and type of food consumed. The mobile device provides a unique vehicle for collecting dietary information that reduces the burden on respondents that are obtained using more classical approaches for dietary assessment. We describe our approach to image analysis that includes the segmentation of food items, features used to identify foods, a method for automatic portion estimation, and our overall system architecture for collecting the food intake information.
In this paper, a new method for motion flow estimation that considers errors in all the derivative measurements is presented. Based on the total least squares (TLS) model, we accurately estimate the motion flow in the general noise case by combining noise model (in form of covariance matrix) with a parametric motion model. The proposed algorithm is tested on two different types of biological motion, a growing plant root and a gastrulating embryo, with sequences obtained microscopically. The local, instantaneous velocity field estimated by the algorithm reveals the behavior of the underlying cellular elements.
Transmission electron microscopy (TEM) is an important modality for the analysis of cellular structures in neurobiology. The computational analysis of neurons entail their segmentation and reconstruction from TEM images. This problem is complicated by the heavily textured nature of cellular TEM images and typically low signal-to-noise ratios. In this paper, we propose a new partial differential equation for enhancing the contrast and continuity of cell membranes in TEM images.
The rapidly growing collection of fruit fly embryo images makes automated Image Segmentation and classification an indispensable requirement for a large-scale analysis of in situ hybridization (ISH) - gene expression patterns (GEP). We present here such an automated process flow for Segmenting, Classification, and Clustering large-scale sets of Drosophila melanogaster GEP that is capable of dealing with most of the complications implicated in the images.
The large amount of data produced by biological live cell imaging studies of cell behavior requires accurate automated cell segmentation algorithms for rapid, unbiased and reproducible scientific analysis. This paper presents a new approach to obtain precise boundaries of cells with complex shapes using ridge measures for initial detection and a modified geodesic active contour for curve evolution that exploits the halo effect present in phase-contrast microscopy. The level set contour evolution is controlled by a novel spatially adaptive stopping function based on the intensity profile perpendicular to the evolving front. The proposed approach is tested on human cancer cell images from LSDCAS and achieves high accuracy even in complex environments.
This paper presents an image denoising algorithm that uses principal component analysis (PCA) in conjunction with the non-local means image denoising. Image neighborhood vectors used in the non-local means algorithm are first projected onto a lower-dimensional subspace using PCA. Consequently, neighborhood similarity weights for denoising are computed using distances in this subspace rather than the full space. This modification to the non-local means algorithm results in improved accuracy and computational performance. We present an analysis of the proposed method's accuracy as a function of the dimensionality of the projection subspace and demonstrate that denoising accuracy peaks at a relatively low number of dimensions.
Complex diffusion was introduced in image processing literature as a means to achieve simultaneous denoising and enhancement of scalar valued images. In this paper, we present a novel geometric framework for achieving complex diffusion on color images expressed as image graphs. In this framework, we develop a new variational formulation for achieving complex diffusion. This formulation involves a modified harmonic map functional and is quite distinct from the Polyakov action described in earlier work by Sochen et al. Our formulation provides a framework for simultaneous (feature preserving) denoising and enhancement. We present results of comparison between the complex diffusion, and Beltrami flow all in the image graph framework.
We propose an approach for non-rigid tracking that represents objects by their set of distribution parameters. Compared to joint histogram representations, a set of parameters such as mixed moments provides a significantly reduced size representation. The discriminating power is comparable to that of the corresponding full high-dimensional histogram yet at far less spatial and computational complexity. The proposed method is robust in the presence of noise and illumination changes, and provides a natural extension to the use of mixture models. Experiments demonstrate that the proposed method outperforms both full color mean-shift and global covariance searches.
We present a novel edge preserved interpolation scheme for fast upsampling of natural images. The proposed piecewise hyperbolic operator uses a slope-limiter function that conveniently lends itself to higher-order approximations and is responsible for restricting spatial oscillations arising due to the edges and sharp details in the image. As a consequence the upsampled image not only exhibits enhanced edges, and discontinuities across boundaries, but also preserves smoothly varying features in images. Experimental results show an improvement in the PSNR compared to typical cubic, and spline-based interpolation approaches.
We propose a particle filtering framework for rigid registration of a model image to a time-series of partially observed images. The method incorporates a model-based segmentation technique in order to track the pose dynamics of an underlying observed object with time. An applicable algorithm is derived by employing the proposed framework for registration of a 3D model of an anatomical structure, which was segmented from preoperative images, to consecutive axial 2D slices of a magnetic resonance imaging (MRI) scan, which are acquired intraoperatively over time. The process is fast and robust with respect to image noise and clutter, variations of illumination, and different imaging modalities.
In analysis of microscopy based images, a major challenge lies in splitting apart cells that appear to overlap because they are too densely packed. This task is complicated by the physics of the image acquisition that causes large variations in pixel intensities. Each image typically contains thousands of cells with each cell having a different orientation, size and intensity histogram. In this paper, a spatial intensity model of a nucleus is incorporated into  to aid cell segmentation from microscopy datasets. An energy functional is defined and with it the spatial intensity distribution of a nuclei is modeled as a Gaussian distribution with constant intensity background. Experimental results on a variety of microscopic data validate its effectiveness.
Object recognition is a fundamental problem in computer vision. Part-based models offer a sparse, flexible representation of objects, but suffer from difficulties in training and often use standard kernels. In this paper, we propose a positive definite kernel called "structure kernel", which measures the similarity of two part-based represented objects. The structure kernel has three terms: 1) the global term that measures the global visual similarity of two objects; 2) the part term that measures the visual similarity of corresponding parts; 3) the spatial term that measures the spatial similarity of geometric configuration of parts. The contribution of this paper is to generalize the discriminant capability of local kernels to complex part-based object models. Experimental results show that the proposed kernel exhibit higher accuracy than state-of-art approaches using standard kernels.
We introduce a framework for modeling spatial patterns of shapes formed by multiple objects in an image. Our approach is graph-based where each node denotes an object and attributes of a node consist of that object's shape, position, orientation, and scale. Neighboring node are connected by edges, and they are allowed to interact in terms of their attributes/features. Similar to a Markov random field, but now applied to more sophisticated features space, the interactions are governed by energy functionals that can be internal or external. The internal energies, composed entirely of interactions between nodes, may include similarity between shapes and pose. The external energies, composed of outside influences, may include the data-likelihood term and the a-priori information about the shapes and the locations of the objects.
We propose a tracking system that is especially well-suited to tracking targets which change drastically in size or appearance. To accomplish this, we employ a fast, two phase template matching algorithm along with a periodic template update method. The template matching step ensures accurate localization while the template update scheme allows the target model to change over time along with the appearance of the target. Furthermore, the algorithm can deliver real-time results even when targets are very large. We demonstrate the proposed method with good results on several sequences showing targets which exhibit large changes in size, shape, and appearance.
In this paper, we present a novel face detection architecture based on the boosted cascade algorithm. A reduced two-field feature extraction scheme for integral image calculation is proposed. Based on this scheme, the required memory for storing integral images is reduced from 400 Kbits to 2.016 Kbits for a 160Ã120 gray scale image. The range of the feature size and location is also reduced so the learning time of the classifier decreases around 10%. In addition, input data are mapped into parallel memories to enhance processing speed in classifier evaluations. This boosted cascade face detection hardware consumes only 0.992 mm<sup>2</sup> under the UMC 90 nm technology and runs at 100 MHz. The experimental results show this face detector can achieve 91% face detection rate for processing 160Ã120 gray scale images at the speed of 190 fps.
A JPEG XR chip for HD-Photo is implemented with 25 mm<sup>2</sup> area in TSMC 0.18 um CMOS 1P6M technology at 100 MHz. According to the simulation results, the 4:4:4 1920x1080 HD-Photo 20 frames/sec can be encoded smoothly.
To achieve high visual quality of intra-frame coding in order to minimize the visual quality degradation caused by color loss, the authors previously presented an RGB-domain inter-color compensation algorithm using strong correlation between RGB color components. Based on that inter-color compensation algorithm, this paper presents a 1080p 60 Hz CODEC system architecture designed to process a bit-rate of up to approximately 100 Mbps in real time. Both the encoding and decoding processes are pipelined on a macroblock level. Since syntax processing is a bottleneck to supporting speeds of up to 100 Mbps, a high performance context-adaptive variable length coding architecture exploiting the look-ahead technique is included in the proposed design. The final chip implementation can achieve real-time encoding and decoding of 1080p 60 Hz videos with reasonable hardware cost and operating clock frequency.
2-D electrophoresis is the study of expressions of proteins. The technique produces images that contain protein spots, One of the tasks in the analysis of these images is matching of protein spots in two corresponding images for differential expression study. In this paper, we propose an algorithm which integrates a hierarchical based and energy based methods. The hierarchical based method initially finds corresponding pairs of spots. We formulate a new matching energy, which consists of local spot structure similarity, image similarity, and a spatial constraint. The proposed energy is minimized to find corresponding pairs of spots by a greedy based optimization algorithm. We extensively tested our method with synthetic images and real 2-D gel images from different biological experiments
This paper presents an overview of the latest transform and quantization designs for H.26L. Unlike the popular discrete cosine transform (DCT) used in previous standards, the transforms in H.26L can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems. The new transforms can also be computed without multiplications, just additions and shifts, in 16-bit arithmetic, thus minimizing computational complexity, especially for low-end processors. By using short tables, the new quantization formulas use multiplications but avoid divisions.
This paper describes an original approach for structuring video documents into scenes by grouping video shots. The method is based on the construction of 1D mosaics. 1D mosaics are built based on X-ray projections of color video frames representing integration along vertical and horizontal axes. The mosaicing is realized by motion compensation in a 1D domain. Grouping of shots in a scene is done by local and global matching of mosaics, based on piecewise linear approximation and hierarchical clustering. The results obtained on feature documentaries are promising.
This paper presents a technique for high-accuracy correspondence search between two rectified images using 1D phase-only correlation (POC). The correspondence search between stereo images can be reduced to 1D search through image rectification. However, we usually employ block matching with 2D rectangular image blocks for finding the best matching point in the 1D search. We propose the use of 1D POC (instead of 2D block matching) for stereo correspondence search. The use of 1D POC makes possible significant reduction in computational cost without sacrificing reconstruction accuracy compared with the 2D POC-based approach. Also, the resulting reconstruction accuracy is much higher than those of conventional stereo matching techniques using SAD (sum of absolute differences) and SSD (sum of squared differences) combined with sub-pixel disparity estimation.
Aligning a tilt series in 3D TEM tomography is not only a shift compensation problem among tilt slices, but rather a parameter registration problem, in which a tilt axis should be registered on each slice with its position/orientation. A new method for 3D tilt alignment is suggested in this paper based on analyzing 1D signals resulting from projecting individual 2D slices in various directions (2D Radon transform). The axis orientation and position are identified by techniques of 1D shift compensation, 2D frequency analysis, etc. Without any information loss during the calculating process, the accuracy and the robustness for axis registration are ensured. Meanwhile, essentially working on 1D data provides the concrete base of computing performance. Based on this method, no any extra work of adding fiducial markers into samples, or detecting corners/edges from slice images are required; therefore both experimental and computational efforts could be heavily reduced.
This paper addresses the problem of calibrating a pin-hole camera from images of 1D objects. Assuming a unit aspect ratio and zero skew, we introduce a novel and simple approach that uses four observations of a 1D object and requires no information about the distances between the points on the object. This is in contrast to existing methods that use two images, but impose more restrictive configurations that require measured distances on the calibrating object. The key features of the proposed technique are its simplicity and ease of use due to the lack of need for any metric information. To demonstrate the effectiveness of the algorithm, we present the processing results on synthetic and real images.
We describe a method for filtering object category from a large number of noisy images. This problem is particularly difficult due to the greater variation within object categories and lack of labeled object images. Our method deals with it by combining a co-training algorithm CoBoost with two features - 1<sup>st</sup> and 2<sup>nd</sup> order features, which define bag of words representation and spatial relationship between local features respectively. We iteratively train two boosting classifiers based on the 1<sup>st</sup> and 2<sup>nd</sup> order features, during which each classifier provides labeled data for the other classifier. It is effective because the 1<sup>st</sup> and 2<sup>nd</sup> order features make up an independent and redundant feature split. We evaluate our method on Berg dataset and demonstrate the precision comparative to the state-of-the-art.
We present a new face detection algorithm based on the 1st-order reduced Coulomb energy (RCE) classifier. The algorithm locates frontal views of human faces at any degree of rotation and scale in complex scenes. The face candidates and their orientations are first determined by computing the Hausdorff distance between a simple face abstraction model and binarized test windows in an image pyramid. Then, after normalizing the energy, each face candidate is verified by two subsequent classifiers; a binary image classifier and the 1st-order RCE classifier While the binary image classifier is employed as a pre-classifier to discard nonfaces with minimum computational complexity, the 1st-order RCE classifier is used as the main face classifier for final verification. An optimal training method to construct the representative face model database is also presented. Experimental results show that the proposed algorithm yields a high detection ratio, while yielding no false alarm.
Compared to only a few years ago, today there is an abundance of annotated image data available on the Internet. For researchers on image retrieval, this is an unforseen but welcome consequence of the rise of Web 2.0 technologies. Popular social networking and content sharing services seem to hold the key to the integration of context and semantics into retrieval. However, at least for now, it appears that this promise has to be taken with a grain of salt. In this paper, we present preliminary empirical results on the tagging behavior of power users of content sharing and social bookmarking services. Our findings suggest different promising research directions for image retrieval and we briefly discuss some of them.
A multimodal face verification process is presented for standard 2D color images, 2.5D range images and 3D meshes. A normalization in orientation and position is essential for 2.5D and 3D images to obtain a corrected frontal image. This is achieved using the spin images of the nose tip and both eyes, which feed an SVM classifier. First, a traditional principal component analysis followed by an SVM classifier are applied to both 2D and 2.5D images. Second, an iterative closest point algorithm is used to match 3D meshes. In all cases, the equal error rate is computed for different kinds of images in the training and test phases. In general, 2.5D range images show the best results (0.1% EER for frontal images). A special improvement in success rate for turned faces has been obtained for normalized 2.5D and 3D images compared to standard 2D images
In this paper a novel error resilient MQ coder for reliable JPEG 2000 image delivery is designed. The proposed coder uses a forbidden symbol in order to force a given amount of redundancy in the codestream. At the decoder side, the presence of the forbidden symbol allows for powerful error correction. Moreover the added redundancy can be easily controlled and the proposed coder is kept backward compatible with MQ. In this work excellent improvements in the case of image transmission across both BSC and AWGN channels are obtained by means of a maximum a posteriori estimation technique.
The JPEG-2000 compression standard codes images into data units, referred to as packets, such that images can be successfully decoded, while incurring some distortion, with a subset of these packets. This paper examines how to optimally select subsets of packets (referred to as schedules) that minimize distortion subject to varying rate constraints. We solve for the optimal schedule at a single rate by solving the precedence constraint knapsack problem (PCKP) via dynamic programming, and show that with modifications that consider the specific dependencies of JPEG-2000 packets, we can compute the optimal schedules for all rates through a single execution of our algorithm. We then analyze important properties of the optimal schedule. Using these properties, we look at a fused-greedy algorithm, similar to the recently proposed convex hull algorithm, to generate embedded schedules of JPEG-2000 packets. These embedded schedules have the property that all the JPEG-2000 packets in lower rate schedules are included in higher rate schedules; embedded schedules enable low-complexity, adaptive streaming. We demonstrate the algorithm's near optional performance through comparisons to the optimal performance for JPEG-2000 coded data
In this paper, minimization of the maximum absolute error (MAE) is achieved under a JPEG 2000 framework when a compression ratio is specified. The process is founded on our recent JPEG 2000-based algorithm for minimizing bit rate for a desired MAE using a residual coding approach [Lucero, A., et al., 2006; Lucero, A., et al., 2005; Yee, Y., et al., 2007] that uses the EBCOT coder separately. This type of algorithm uses the lossy and lossless capabilities of JPEG 2000 to achieve a desired MAE or a desired total bit rate. The technique to achieve the lowest MAE possible for a specified average bit rate is developed here for application to 3-D scientific data sets. Lossy compression is applied to the original data, while lossless compression is employed on the quantized residuals (the difference between the original and the lossy decompressed data). The lowest MAE is achieved by optimizing the allocation of the desired total bit rate between the two contributing rates corresponding to the lossy and the lossless compression steps. The methodology for achieving minimum MAE and results for 3-D meteorological data are presented here.
In the multiple description paradigm, a controllable amount of redundancy is inserted among descriptions, in order to help estimating those ones which are possibly lost due to network congestion. This redundancy can also be exploited in order to correct errors at bit level. In this paper, we propose a novel technique to generate multiple descriptions of video encoded with motion-JPEG 2000, which exploits the inserted extra redundancy also to guarantee error protection in case all descriptions are received, but are possibly affected by bit errors. This method yields excellent performance, since it guarantees not only protection of video information transmitted over non prioritized networks subject to independent packet erasure processes, but also resilience towards the corruption at bit level. Moreover, the generated streams are fully compatible with the part 3 of the JPEG 2000 standard.
This paper proposes a collusion attack-resilient method of encryption for access control of JPEG 2000 codestreams with hierarchical scalabilities. The proposed method generates one encryption key from one single key by multi-dimensional scanning to serve encryption keeping the scalability of codestrems. To avoid collusion attacks in which multiple users generate an illegal key from their own keys to overcome the access control, sufficient conditions are considered in this method. Moreover, a skip encryption is introduced to decrease the computational complexity and key management-and-de-livery cost of encryption. Simulation results show the effectiveness of the proposed method.
JPEG 2000, the new ISO/ITU-T standard for still image coding, is about to be finished. Other new standards have been recently introduced, namely JPEG-LS and MPEG-4 VTC. This paper compares the set of features offered by JPEG 2000, and how well they are fulfilled, versus JPEG-LS and MPEG-4 VTC, as well as the older but widely used JPEG and the PNG. The study concentrates on the set of supported features, although lossless and lossy progressive compression efficiency results are also reported. Each standard, and the principles of the algorithms behind them, are also described. As the results show, JPEG 2000 supports the widest set of features among the evaluated standards, while providing superior rate-distortion performance.
This paper proposes an encryption method that uses short keys to enable hierarchical access controls for JPEG 2000 codestreams. The proposed method provides images of various quality levels that may be different from the quality at encoding, though it uses a single codestream and a single managed key (masterkey). Only one key generated from the masterkey is delivered to a user authorized to access a reserved quality image. This method also stems users' collusion to access superior-quality images. Some conventional methods of this proposed method serve the above features, but those keys are much longer than the proposed method. The proposed method uses the smaller number of partial keys than the conventional methods.
In this research, the issue of integrated data hiding and JPEG 2000 [Mar 2000, Dec 2000] image compression is investigated and a data-hiding scheme is proposed to embed covert messages into JPEG 2000 code streams. With the introduction of visual masking measurement and CSF weighting, visual distortion is very slight even though a large amount of data is embedded. The extraction of hidden data can be performed progressively at the decoder. Experimental results are given finally to show the performance of the proposed scheme.
JPEG 2000 will soon be an international standard for still image compression. This paper describes that standard at a high level, indicates the component pieces which empower the standard, and gives example applications which highlight differences between JPEG 2000 and prior image compression standards.
Secure scalable streaming (SSS) enables low-complexity, high-quality transcoding at intermediate, possibly untrusted, network nodes without compromising the end-to-end security of the system S. J. Wee, J. G. Apostolopoulos (2001). SSS encodes, encrypts, and packetizes video into secure scalable packets in a manner that allows downstream transcoders to perform transcoding operations such as bitrate reduction and spatial downsampling by simply truncating or discarding packets, and without decrypting the data. Secure scalable packets have unencrypted headers that provide hints such as optimal truncation points to downstream transcoders. Using these hints, downstream transcoders can perform near-optimal secure transcoding. This paper presents a secure scalable streaming system based on motion JPEG-2000 coding with AES or triple-DES encryption. The operational rate-distortion (R-D) performance for transcoding to various resolutions and quality levels is evaluated, and results indicate that end-to-end security and secure transcoding can be achieved with near R-D optimal performance. The average overhead is 4.5% for triple-DES encryption and 7% for AES, as compared to the original media coding rate, and only 2-2.5% overhead as compared to end-to-end encryption which does not allow secure transcoding.
A bit-stream-level method for identifying encrypted JPEG 2000 images without having to decrypt them is described. It is well known that editing the scenes of the movie is often necessary. In the editing process, image identification plays an important role in finding a frame that should be re-encoded. Thus, identification of encrypted images is very useful in digital cinema because all frames in digital cinema are encoded and encrypted. The proposed method directly uses encrypted JPEG 2000 images so that decryption-free identification without JPEG 2000 decoding is possible. The proposed method is both accurate and fast. In principle, identification based on the proposed method does not produce false negative matches regardless of the compression ratio. Moreover, since there is no need to decode and decrypt the images, the average processing time for identification is very short and independent of the encoded image size.
One difficulty in image compression research is designing meaningful performance metrics. Purely numerical measures such as PSNR are unsatisfactory because they do not correlate well with human assessment. We introduce a method of subjective image evaluation for image compression called calibrated rank ordering (CRO). CRO is attractive because it produces substantial numerical results without excessive burden on the observers. Using CRO we compare traditional JPEG with JPEG 2000 in a variety of modes. We also consider the effect of differing images sources, i.e., digital still camera vs. film scan. Finally, we compare and contrast the artifacts of both JPEG and JPEG 2000.
We review the various tools in JPEG 2000 that allow the users to take advantages of the various properties of the human visual system such as spatial frequency sensitivity and the visual masking effect. We show that the visual tool sets in JPEG 2000 are much richer than what was available in JPEG, where only locally invariant frequency weighting can be exploited.
Real-time multi-object extraction at 2000 fps was realized by designing a cell-based labeling algorithm. The algorithm can label the divided cells in an image by scanning the image only once to obtain their moment features, and the computational complexity required for labeling can be remarkably reduced. The cell-based labeling algorithm for 8 × 8 pixel cells was implemented on a high-speed vision platform, and multiple objects in an image of 512 × 512 pixels could be extracted at 2000 fps. An experiment was performed using a quickly rotating object to verify the performance of our multi-object extraction system.
As the resolution and pixel fidelity of digital imagery grows, there is a greater need for more efficient compression and extraction of images and sub-images. The ability to handle many types of image data, extract images at different resolutions and quality, lossless and lossy, zoom and pan, and extract regions-of-interest is the new measures of image compression system performance. JPEG 2000 is designed to address the needs of high quality imagery. This paper describes how the JPEG 2000 syntax and file format support these features. The decomposition of the image into the codestream is described along with associated syntax markers. Examples of how the syntax enables some of the features of JPEG 2000 are offered.
This paper describes methods to recover the useful data in JPEG and JPEG 2000 compressed images and to estimate data for those portions of the image where correct data cannot be recovered. These techniques are designed to handle the loss of hundreds of bytes in the file. No use is made of restart markers or other optional error detection features of JPEG and JPEG 2000, but an uncorrupted low resolution version of the image, such as an icon, is assumed to be available. These icons are typically present in Exif or JFIF format JPEG files.
JPEG-2000 coding introduces at high compression ratios, some perceptual impairments (blurring and ringing effects), which can be exploited by a no reference (NR) quality metric. Even if these distortions are present in the whole processed image, the human visual system judges, the perceptual quality, identifying and selecting some regions of interest. In this paper, to judge of the perceptual quality of JPEG-2000 compressed images, we propose to measure both of ringing and blurring distortions, locally weighted by an importance map generated on the one hand, by an Osbergers modified model and on the other hand, by a simple algorithm, of attention model. The respective predicted scores have been compared with the subjective quality scores. With a comparative study on the contribution of each importance map, we demonstrate the significant interest of these weights in a NR quality metric
Part 7 of MPEG-21 entitled digital item adaptation (DIA), is an emerging metadata standard defining protocols and descriptions enabling content adaptation for a wide variety of networks and terminals, with emphasis on format-independent mechanisms. The DIA descriptions provide a standardized interface not only to a variety of format-specific adaptation engines, but also to a fully format-independent adaptation engine for scalable bit-streams. A format-independent engine contains a decision-taking module operating in a semantics-independent manner, cascaded with a bit-stream adaptation module that uses an XML transformation to model the bit-stream adaptation process using parameters derived from decisions made. In this paper, we describe the DIA descriptions that enable such fully format-independent bit-stream adaptation. Universal adaptation engines substantially reduce the adoption costs because the same infrastructure can be used for different types of scalable media, including proprietary and encrypted.
The new MPEG-21 standard defines a multimedia framework to enable transparent and augmented use of multimedia resources across heterogeneous networks and devices used by different communities. In this paper, we incorporated the perceived motion energy (PME) model into the proposed MPEG-21 digital item adaptation framework for frame dropping in H.264 encoded video adaptation. There are two advantages of this work, one is the use of PME model to reduce the viewer's perceived motion jitter due to frame dropping to a minimum. The other is the adaptation nodes can easily apply frame dropping operations without knowledge of detailed encoding syntax of H.264 videos.
This paper deals with the visual content adaptation, in the context of MPEG-21 standard, to help low vision users have better accessibility to the contents. The proposed adaptation targets at two low vision symptoms, loss of fine detail and lack of contrast. Specifically, we present an adaptation framework describing the problem space and then a systematic contrast-enhancement method to improve the content visibility for low vision users. The experiment results show that the proposed framework and method are effective for the low vision users.