Histological characterization is used in clinical and research contexts as a highly sensitive method for detecting the morphological features of disease and abnormal gene function. Histology has recently been accepted as a phenotyping method for the forthcoming Zebrafish Phenome Project, a large-scale community effort to characterize the morphological, physiological, and behavioral phenotypes resulting from the mutations in all known genes in the zebrafish genome. In support of this project, we present a novel content-based image retrieval system for the automated annotation of images containing histological abnormalities in the developing eye of the larval zebrafish.
The automated annotation of conversational video by semantic miscommunication labels is a challenging topic. Although miscommunications are often obvious to the speakers as well as the observers, it is difficult for machines to detect them from the low-level features. We investigate the utility of gestural cues in this paper among various non-verbal features. Compared with gesture recognition tasks in human-computer interaction, this process is difficult due to the lack of understanding on which cues contribute to miscommunications and the implicitness of gestures. Nine simple gestural features are taken from gesture data, and both simple and complex classifiers are constructed using machine learning. The experimental results suggest that there is no single gestural feature that can predict or explain the occurrence of semantic miscommunication in our setting.
In continuous media servers, disk load can be reduced by using
buffer cache. In order to utilize the saved disk bandwidth by caching, a
continuous media server must employ an admission control scheme to
decide whether a new client can be admitted for service without
violating the requirements of clients already being serviced. Since
deterministic admission control is based on the worst case assumption,
it causes the wastage of the system resources. In this paper, we propose
a statistical admission control scheme for a continuous media servers
where caching is used to reduce disk load. This scheme improves disk
utilization and allow more streams to be serviced with reasonable
computational overhead, while maintaining near-deterministic service.
The scheme, called short-sighted prediction admission control (SPAC),
combines exact prediction through on-line simulation and statistical
estimation using a probabilistic model of future disk load in order to
reduce computation overhead
Recommendation systems have been investigated and implemented in many aspects. Particularly, in case of collaborative filtering system, more important issue is how to manipulate the personalized recommendation results for better user understandability and satisfaction. Collaborative filtering system predicts items of interest for users based on predictive relationship discovered between the item and others. In this paper, the categorization for grouping associative items mining, for the purpose of improving accuracy and performance in the item-based collaborative filtering, is proposed. It is possible that, if the associative item is required to be simultaneously regrouped in all other groups in which they occur, the proposed method regrouped the associative items into the relevant group. In addition, the proposed method can result in improved predictive performance under the sparse data and cold-start circumstance that starts from small items in the collaborative filtering. And this method can increase the prediction accuracy and the scalability because of removing the noise generated by ratings on items of dissimilar content or interest. The approach is empirically evaluated for comparison with k-means, average link, and robust, using the MovieLens dataset. This method was found to significantly outperform the previous method.
Wireless sensor networks are used in many applications in military, ecological, health, and other areas. These applications often include the monitoring of sensitive information making the security issue one of the most important aspects to consider in this field. However, most of protocols optimize for the limited capabilities of sensor nodes and the application specific nature of the networks, but they are vulnerable to serious security attacks. In this paper, a Secure Energy and Reliability Aware data gathering protocol (SERA) is proposed, which provides energy efficiency and data delivery reliability as well as a security scheme giving protection against the most common network layer attacks such as spoofed, altered, or replayed routing information, selective forwarding, sinkhole attacks, Sybil attacks, wormhole attacks, HELLO flood attacks, and acknowledgment spoofing attacks.
The development of collaborative multimedia applications today follows a vertical development approach, which is a major inhibitor that drives up the cost of development and slows down the pace of innovation of new generations of collaborative applications. In this paper, we propose a network communication broker (NCB) that provides a unified higher-level abstraction that encapsulates the complexity of network-level communication control and media delivery for the class of multimedia collaborative applications. NCB expedites the development of next-generation applications with diverse communication logics. Furthermore, NCB-based applications can be easily ported to new network environments. In addition, the self-managing design of NCB supports dynamic adaptation in response to changes in network conditions and user requirements
One of the central problems regarding media search is the semantic gap between the low-level features computed automatically from media data and the human interpretation of them. This is because the notion of similarity is usually based on high-level abstraction but the low-level features do not sometimes reflect the human perception. In this paper, we assume the semantics of a media is determined by the contextual relationship in a dataset, and introduce the method to capture the contextual information from a large media (especially image) dataset for effective search. Similarity search in an image database based on this contextual information shows encouraging experimental results.
Based on a fifteen month investigation of post production
facilities for both the entertainment and broadcast industries, we
identified a number of challenges that arise when designing a server in
support of multiple editing stations. These include how to: share the
same content among multiple editors; support continuous display of the
edited content for each editor; support complex editing operations;
compute the set of changes (deltas) proposed by an editor; and to
compare the deltas proposed by multiple editors. It is beyond the focus
of this paper to report on each challenge and its related solutions.
Instead, we focus on one challenge and the physical design of the system
that addresses this issue
Advances in high speed networking technologies and video compression techniques have made video-on-demand (VOD) services feasible. A large-scale VOD system imposes a large demand on bandwidth and storage resources, and therefore, parallel disks are typically used for providing VOD service. Although striping of movie data across a large number of disks can balance the utilization among these disks, such a striping technique can exhibit additional complexity, for instance, in data management, such as synchronization among disks during data delivery, as well as in supporting fault tolerant behavior. Therefore, it is more practical to limit the extent of data striping, for example, by arranging the disks in groups (or nodes) and then allowing intra-group (or intra-node) data striping only. With multiple striping groups, however, we may need to assign a movie to multiple nodes so as to satisfy the total demand of requests for that movie. Such an approach gives rise to several design issues, including: what is the right number of copies of each movie we need so as to satisfy the demand and at the same time not waste storage capacity; how to assign these movies to different nodes in the system; and what are efficient approaches to altering the number of copies of each movie (and their placement) when the need for that arises. We study an approach to dynamically reconfiguring the VOD system so as to alter the number of copies of each movie maintained on the server as the access demand for these movies fluctuates. We propose various approaches to addressing the above stated issues, which result in a VOD design that is adaptive to the changes in data access patterns. Performance evaluation is carried out to quantify the costs and the performance gains of these techniques
Multimedia systems combine a variety of information sources, such as voice, graphics, animation, images, music, and full-motion video, into a wide range of applications. The paper initially categorizes existing multimedia applications into three classes: non-interactive-oriented, interactive-oriented client-server-based, and interactive-oriented peer-party-based applications. In particular, the paper examines interactive-oriented applications and provides an in-depth survey of the media synchronization problem for the design of these applications. The paper then presents our Distributed Multimedia Teleworking System (DMTS) prototype, which allows two or more remote systems in collaboration to access and modify multimedia data through a network in a fully synchronous fashion. The system was developed over TCP/IP on an FDDI network, using an XVideo D/A card. The media supported by DMTS include text, graphics, voice, and video. For text and graphics, to maintain the coherence of the data being simultaneously modified, DMTS employs a master-slave collaboration model between the two remote systems. Moreover, DMTS also adopts effective mechanisms to reduce skew (asynchrony) and jitter delays between video and voice data streams. Finally, the paper demonstrates that DMTS achieves a maximum throughput of 13 frames per second and the throughput bottleneck resides in the hardware capture and the D/A processing of video frames
Technical manuals are very diverse, ranging from manuals on software to manuals on commodities, general instructions and technical manuals that deal with specific domains such as mechanical maintenance. Due to the vast amount of manual information finding the necessary information is quite difficult. In case of electronic maintenance manuals currently used by companies, mechanics should search for the related information to accomplish their tasks. And it is difficult to grasp relationships among contents in manuals. Search process is time-consuming and laborious for mechanics. Many researchers have adopted ontology to solve these problems and semantically represent contents of manuals. However if ontology becomes very large and complex, it is not easy to work with ontology. Visualization has been an effective way to grasp and manipulate ontology. In this research, we model new ontology to represent and retrieve contents of manuals and design the visualization system based on proposed ontology. To model ontology, we analyzed aircraft maintenance process, extracted concepts and defined relationships between concepts. After modeling, we created instances of each class using technical manuals. Our system visualizes related information so that mechanics can intuitively grasp the information. This allows workers to easily get information for given tasks and to reduce their time to search related information. Also, related information can be understood at a time through visualization.
Multimedia applications including video data require the smoothing
of video playout to prevent potential playout discontinuity. In this
paper, we propose a dynamic video playout smoothing method, called the
video smoother, which dynamically adopts various playout rates in an
attempt to compensate for high delay variance of networks. Specifically,
if the number of frames in the buffer exceeds a given threshold (TH),
the smoother employs a maximum playout rate. Otherwise, the smoother
uses proportionally reduced rates in an effort to eliminate playout
pauses resulting from the emptiness of the playout buffer. To determine
THs under various loads, we present an analytic model assuming the
Poisson process arrival correspondent with a network with a traffic
shaper. Based on the analytic results, we establish a paradigm of
determining THs and playout rates for achieving different playout
qualities under various loads of networks. Finally, to demonstrate the
viability of the video smoother, we have implemented a prototyping
system including a multimedia teleconferencing application and the video
smoother performing as a part of the transport layer. The prototyping
results show that the video smoother achieves smooth playout incurring
only unnoticeable delays
Multimedia end-user terminals are expected to perform advanced user interface related tasks. These tasks are carried out by user interface runtime tools and include, among others, the visualization of complex graphics and the efficient handling of user input. In addition, the terminal's graphical system is expected, for example, to be able to synchronize audio and video, and control different contexts on the same screen. Finally, the availability of high level tools to simplify the user interface implementation and the adaptiveness of the user interfaces for a diversity of configurations are, as well, desirable features. We present a layered model that meets the just mentioned requirements. The architecture is divided into five different layers: hardware abstraction layer, multimedia cross platform libraries, graphical environment, GUI toolkit and high level languages. Moreover, a prototype system based on the architecture, targeted to digital television receivers, was implemented for the purpose of testing the validity of the concepts. In order to evaluate the prototype, some already developed MHP compliant digital television applications were tested. In addition the prototype was extended with a high level profile(i.e., SMIL support) and a low level one, (i.e., access to the frame buffer memory).
Visual media data such as an image is the raw data representation for many important applications. The biggest challenge in using visual media data comes from the extremely high dimensionality. We present a comparative study on spatial interest pixels (SIPs), including eight-way (a novel SIP miner), Harris, and Lucas-Kanade, whose extraction is considered as an important step in reducing the dimensionality of visual media data. With extensive case studies, we have shown the usefulness of SIPs as the low-level features of visual media data. A class-preserving dimension reduction algorithm (using GSVD) is applied to further reduce the dimension of feature vectors based on SIPs. The experiments showed its superiority over PCA.
The Web development encouraged different organizations and individuals to expose their multimedia documents on the internet.
Additionally, the migration to web 2.0 offered users the chance to comment and annotate the contents of these multimedia documents.
Museums and libraries are particularly interested in users’ feedback and work, because many collections, such as handwritten
manuscripts, are still puzzles for archivists. Therefore any feedback concerning these contents will be welcomed. This article
focuses on the design and the implementation of a web archive. The main objective is enabling users to annotate easily and
remotely manuscript documents using web 2.0 application. User annotations are considered important to enrich the archive contents
with essential information nevertheless not all users are experts in the manuscript domain. Accordingly, users need a kind
of assistance during the search and annotation processes. The proposed assistant in our archive is a recommender system; it
relies on registered traces of the user interaction with the documents to generate suggestions.
KeywordsWeb 2.0 archive–Manuscript annotation–Collaboration–User traces–Recommender system
We report results on audio copy detection for TRECVID 2009 copy detection task. This task involves searching for transformed
audio queries in over 385h of test audio. The queries were transformed in seven different ways, three of them involved mixing
unrelated speech to the original query, making it a much more difficult task. We give results with two different audio fingerprints
and show that mapping each test frame to the nearest query frame (nearest-neighbor fingerprint) results in robust audio copy
detection. The most difficult task in TRECVID 2009 was to detect audio copies using predetermined thresholds computed from
2008 data. We show that the nearest-neighbor fingerprints were robust to even this task and gave actual minimal normalized
detection cost rate (NDCR) of around 0.06 for all the transformations. These results are close to those obtained by using
the optimal threshold for each transform. This result shows the robustness of the nearest-neighbor fingerprints. These nearest-neighbor
fingerprints can be efficiently computed on a graphics processing unit, leading to a very fast search.
KeywordsAudio copy detection-Copy detection-Energy difference-Nearest neighbor
H.264/AVC achieves higher compression efficiency than previous video coding standards. However, the process of selecting the
optimal coding mode for each macroblock (MB) results in extremely high computation complexity, which make it difficult for
practical use. In this paper, an efficient algorithm is proposed to reduce the complexity of MB mode selection. The proposed
algorithm identified the interior region of the motion object by using the motion vectors information firstly. For the interior
region surrounded by the identical motion vectors, we skip the mode selection of the MBs and then treat them with large block
size modes directly. We also discuss the specific examples in this region. For the boundary region, we classify them into
different types according to the coded mode information. After that we process the different regions with different mode set
distinctly. Experimental results show that the proposed algorithm can save the encoding time up to 46% on average compared
to the conventional method in the JVT JM8.6 reference encoder with only 0.12dB performance degradation.
KeywordsMode selection–Fast algorithm–Motion region classification–H.264–Video coding
In this paper, we implemented the MAC-based RTL module for inverse DCT in H.264/AVC to improve applicability, to reduce processing
time and utilize resources. The paper highlights design of FU architecture, its interconnection topology, regular formula
of inverse DCT and array processor mapping as well as MAC-based RTL module constructing. Multi-directional FUA and FPGA were
presented along with an evaluated performance and simulation result. Hence, the paper encompasses design of single FU that
was verified with the performance test at maximum frequency 200 MHz; the designed 4-by-4 FUA operates over 100 MHz. The proposed
multi-directional FU can be extended to n-by-n FUA that functionality can be extended to next video coding standard (H.265/HEVC).
KeywordsFunctional Unit–FPGA–H.264/AVC–Inverse DCT
Decoding of an H.264 video stream is a computationally demanding multimedia application which poses serious challenges on
current processor architectures. For processors with strongly limited computational resources, a natural way to tackle this
problem is the use of multi-core systems. The contribution of this paper lies in a systematic overview and performance evaluation
of parallel video decoding approaches. We focus on decoder splittings for strongly resource-restricted environments inherent
to mobile devices. For the evaluation, we introduce a high-level methodology which can estimate the runtime behaviour of multi-core
decoding architectures. We use this methodology to investigate six methods for accomplishing data-parallel splitting of an
H.264 decoder. These methods are compared against each other in terms of runtime complexity, core usage, inter-communication
and bus transfers. We present benchmark results using different numbers of processor cores. Our results shall aid in finding
the splitting strategy that is best-suited for the targeted hardware-architecture.
KeywordsVideo decoding–H.264/AVC–Multimedia–Multi-core–Embedded architectures
We propose an algorithm for multivariate Markovian characterisation of H.264/SVC scalable video traces at the sub-GoP (Group
of Pictures) level. A genetic algorithm yields Markov models with limited state space that accurately capture temporal and
inter-layer correlation. Key to our approach is the covariance-based fitness function. In comparison with the classical Expectation
Maximisation algorithm, ours is capable of matching the second order statistics more accurately at the cost of less accuracy
in matching the histograms of the trace. Moreover, a simulation study shows that our approach outperforms Expectation Maximisation
in predicting performance of video streaming in various networking scenarios.
KeywordsH.264/SVC–Traffic characterisation–Markovian arrival process
In H.264/AVC, discrete cosine transform (DCT) is performed on the residual blocks after prediction. However, the mismatch
between variable block sizes and the fixed transform matrix not only degrades decorrelation performance but also causes severe
blocky artifacts inside the blocks. In previous work, M-channel filter bank system (MCFBS) was proposed to overcome these defects. However, the increased percentage of encoding
time by using MCFBS is very high, especially for intra coding. More seriously, the constructed M-channel filter bank with floating-point coefficients is an obstacle to hardware implementation. In this work, a hybrid M-channel Filter bank and DCT (HMD) framework is proposed for intra coding. Besides, the integer transform of a newly constructed
M-channel filter bank is also implemented for HMD. Experimental results demonstrate that HMD can reduce 64–69% of the complexity
of MCFBS with negligible quality degradation.
Keywords
M-channel filter bank-Discrete cosine transform-Intra coding-H.264/AVC
To investigate the benefits of scalable codecs in the case of rate adaptation problem, a streaming system for scalable H.264
videos has been implemented. The system considers congestion level in the network and buffer status at the client during adaptation
process. The rate adaptation algorithm is content adaptive. It selects an appropriate substream from the video file by taking
into account the motion dynamics of video. The performance of the system has been tested under congestion-free and congestion
scenarios. The performance results indicate that the system reacts to congestion properly and can be used for Internet video
streaming where losses occur unpredictably.
The most recent literature indicates multiple description coding (MDC) as a promising coding approach to handle the problem
of video transmission over unreliable networks with different quality and bandwidth constraints. We introduce an approach
moving from the concept of spatial MDC and introducing some algorithms to obtain sub-streams that are more efficient by exploiting
some form of scalability. In the algorithm, we first generate four subsequences by sub-sampling, then two of these subsequences
are jointly used to form each of the two descriptions. For each description, one of the original subsequences is predicted
from the other one via some scalable algorithms, focusing on the inter layer prediction scheme. The proposed algorithm has
been implemented as pre- and post- processing of the standard H.264/SVC coder. The experimental results are presented and
we show it provides excellent results.
The use of declarative languages in digital TV systems, as well as IPTV systems, facilitates the creation of interactive applications.
However, when an application becomes more complex, with many user interactions, for example, the hypermedia document that
describes that application becomes bigger, having many lines of XML code. Thus, specification reuse is crucial for an efficient
application development process. This paper proposes the XTemplate 3.0 language, which allows the creation of NCL hypermedia
composite templates. Templates define generic structures of nodes and links to be added to a document composition, providing
spatio-temporal synchronization semantics to it. The use of hypermedia composite templates aims at facilitating the authoring
work, allowing the reuse of hypermedia document common specifications. Using composite templates, hypermedia documents become
simpler and easier to be created. The 3.0 version of XTemplate adds new facilities to the XTemplate language, such as the
possibility of specifying presentation information, the attribution of values to variables and connector parameters during
template processing time and the template ability to extend other templates. As an application of XTemplate, this work extends
the NCL 3.0 declarative language with XTemplate, adding semantics to NCL contexts and providing document structure reuse.
In addition, this paper also presents two authoring tools: the template processor and the wizard to create NCL documents using
templates. The wizard tool allows the author to choose a template included in a template base and create an NCL document using
that template. The template processor transforms an NCL document using templates into a standard NCL 3.0 document according
to digital TV and IPTV standards.
KeywordsHypermedia composite templates–Reuse–Spatio-temporal semantics–XTemplate–Interactive TV–NCL–Hypermedia authoring