Multimedia Tools and Applications

Published by Springer Nature

Online ISSN: 1573-7721

·

Print ISSN: 1380-7501

Articles


Fig. 3 An overview of the high-throughput histology workflow (modified from Sabaliauskas et al. [32])  
Fig. 4 Example of histological images of wild-type (normal) and mutant zebrafish eyes at age 5 dpf (days post fertilization)  
Fig. 14 Illustration describing how the feature matrix extracted from each frieze-expanded image is reshaped into a series of sorted nine-element feature vectors, one for each feature-neighborhood correspondence
Fig. 15 Typical example images spanning the range of histological phenotypes and artifacts that SHIRAZ is trained to recognize  
Fig. 17 Screenshot of entry point to SHIRAZ Web-based demo site  

+2

SHIRAZ: an automated histology image annotation system for zebrafish phenomics
  • Article
  • Full-text available

January 2011

·

556 Reads

·

·

·

James Z Wang
Histological characterization is used in clinical and research contexts as a highly sensitive method for detecting the morphological features of disease and abnormal gene function. Histology has recently been accepted as a phenotyping method for the forthcoming Zebrafish Phenome Project, a large-scale community effort to characterize the morphological, physiological, and behavioral phenotypes resulting from the mutations in all known genes in the zebrafish genome. In support of this project, we present a novel content-based image retrieval system for the automated annotation of images containing histological abnormalities in the developing eye of the larval zebrafish.
Download
Share

Fig. 1 Schematic illustration of segmentation and coding of segments for different window sizes. The root circles indicate the starting points of gestures and miscommunications. The arrows represent the duration of the gestures and miscommunications  
Table 1 Gestural features calculated on segment s
Table 3 F-scores short-term, 5-sec window
Gestural cue analysis in automated semantic miscommunication annotation

November 2012

·

130 Reads

The automated annotation of conversational video by semantic miscommunication labels is a challenging topic. Although miscommunications are often obvious to the speakers as well as the observers, it is difficult for machines to detect them from the low-level features. We investigate the utility of gestural cues in this paper among various non-verbal features. Compared with gesture recognition tasks in human-computer interaction, this process is difficult due to the lack of understanding on which cues contribute to miscommunications and the implicitness of gestures. Nine simple gestural features are taken from gesture data, and both simple and complex classifiers are constructed using machine learning. The experimental results suggest that there is no single gestural feature that can predict or explain the occurrence of semantic miscommunication in our setting.

An admission control scheme for continuous media servers using caching

March 2000

·

85 Reads

In continuous media servers, disk load can be reduced by using buffer cache. In order to utilize the saved disk bandwidth by caching, a continuous media server must employ an admission control scheme to decide whether a new client can be admitted for service without violating the requirements of clients already being serviced. Since deterministic admission control is based on the worst case assumption, it causes the wastage of the system resources. In this paper, we propose a statistical admission control scheme for a continuous media servers where caching is used to reduce disk load. This scheme improves disk utilization and allow more streams to be serviced with reasonable computational overhead, while maintaining near-deterministic service. The scheme, called short-sighted prediction admission control (SPAC), combines exact prediction through on-line simulation and statistical estimation using a probabilistic model of future disk load in order to reduce computation overhead

Categorization for Grouping Associative Items Mining in Item-Based Collaborative Filtering

May 2011

·

47 Reads

Recommendation systems have been investigated and implemented in many aspects. Particularly, in case of collaborative filtering system, more important issue is how to manipulate the personalized recommendation results for better user understandability and satisfaction. Collaborative filtering system predicts items of interest for users based on predictive relationship discovered between the item and others. In this paper, the categorization for grouping associative items mining, for the purpose of improving accuracy and performance in the item-based collaborative filtering, is proposed. It is possible that, if the associative item is required to be simultaneously regrouped in all other groups in which they occur, the proposed method regrouped the associative items into the relevant group. In addition, the proposed method can result in improved predictive performance under the sparse data and cold-start circumstance that starts from small items in the collaborative filtering. And this method can increase the prediction accuracy and the scalability because of removing the noise generated by ratings on items of dissimilar content or interest. The approach is empirically evaluated for comparison with k-means, average link, and robust, using the MovieLens dataset. This method was found to significantly outperform the previous method.

Figure 4. Different radio ranges used in SERA
Figure 5. Finite state machine used in SERA
Fig 6, 7, and 8 shows how paths to the base station changes depending values of α, β, γ, and δ.
Fig. 7 SERA's data gathering path when α=1
SERA: A Secure Energy and Reliability Aware Data Gathering for Sensor Networks
Wireless sensor networks are used in many applications in military, ecological, health, and other areas. These applications often include the monitoring of sensitive information making the security issue one of the most important aspects to consider in this field. However, most of protocols optimize for the limited capabilities of sensor nodes and the application specific nature of the networks, but they are vulnerable to serious security attacks. In this paper, a Secure Energy and Reliability Aware data gathering protocol (SERA) is proposed, which provides energy efficiency and data delivery reliability as well as a security scheme giving protection against the most common network layer attacks such as spoofed, altered, or replayed routing information, selective forwarding, sinkhole attacks, Sybil attacks, wormhole attacks, HELLO flood attacks, and acknowledgment spoofing attacks.

Figure 2. Example self-optimization policy specification. 
Table 2 . 
Figure 5: QoS and Self-Management internal architecture. 
Figure 9: Frame rate changes with network bandwidth change. 
A User-Centric Network Communication Broker for Multimedia Collaborative Computing

December 2006

·

149 Reads

·

·

Weixiang Sun

·

[...]

·

Yi Deng
The development of collaborative multimedia applications today follows a vertical development approach, which is a major inhibitor that drives up the cost of development and slows down the pace of innovation of new generations of collaborative applications. In this paper, we propose a network communication broker (NCB) that provides a unified higher-level abstraction that encapsulates the complexity of network-level communication control and media delivery for the class of multimedia collaborative applications. NCB expedites the development of next-generation applications with diverse communication logics. Furthermore, NCB-based applications can be easily ported to new network environments. In addition, the self-managing design of NCB supports dynamic adaptation in response to changes in network conditions and user requirements

Capturing Contextual Relationship for Effective Media Search

January 2010

·

46 Reads

One of the central problems regarding media search is the semantic gap between the low-level features computed automatically from media data and the human interpretation of them. This is because the notion of similarity is usually based on high-level abstraction but the low-level features do not sometimes reflect the human perception. In this paper, we assume the semantics of a media is determined by the contextual relationship in a dataset, and introduce the method to capture the contextual information from a large media (especially image) dataset for effective search. Similarity search in an image database based on this contextual information shows encouraging experimental results.

Design of multi-user editing servers for continuous media

March 1998

·

34 Reads

Based on a fifteen month investigation of post production facilities for both the entertainment and broadcast industries, we identified a number of challenges that arise when designing a server in support of multiple editing stations. These include how to: share the same content among multiple editors; support continuous display of the edited content for each editor; support complex editing operations; compute the set of changes (deltas) proposed by an editor; and to compare the deltas proposed by multiple editors. It is beyond the focus of this paper to report on each challenge and its related solutions. Instead, we focus on one challenge and the physical design of the system that addresses this issue

Threshold-based dynamic replication in large-scale video-on-demandsystems

March 1998

·

18 Reads

Advances in high speed networking technologies and video compression techniques have made video-on-demand (VOD) services feasible. A large-scale VOD system imposes a large demand on bandwidth and storage resources, and therefore, parallel disks are typically used for providing VOD service. Although striping of movie data across a large number of disks can balance the utilization among these disks, such a striping technique can exhibit additional complexity, for instance, in data management, such as synchronization among disks during data delivery, as well as in supporting fault tolerant behavior. Therefore, it is more practical to limit the extent of data striping, for example, by arranging the disks in groups (or nodes) and then allowing intra-group (or intra-node) data striping only. With multiple striping groups, however, we may need to assign a movie to multiple nodes so as to satisfy the total demand of requests for that movie. Such an approach gives rise to several design issues, including: what is the right number of copies of each movie we need so as to satisfy the demand and at the same time not waste storage capacity; how to assign these movies to different nodes in the system; and what are efficient approaches to altering the number of copies of each movie (and their placement) when the need for that arises. We study an approach to dynamically reconfiguring the VOD system so as to alter the number of copies of each movie maintained on the server as the access demand for these movies fluctuates. We propose various approaches to addressing the above stated issues, which result in a VOD design that is adaptive to the changes in data access patterns. Performance evaluation is carried out to quantify the costs and the performance gains of these techniques

DMTS: A Distributed Multimedia Teleworking System

November 1995

·

30 Reads

Multimedia systems combine a variety of information sources, such as voice, graphics, animation, images, music, and full-motion video, into a wide range of applications. The paper initially categorizes existing multimedia applications into three classes: non-interactive-oriented, interactive-oriented client-server-based, and interactive-oriented peer-party-based applications. In particular, the paper examines interactive-oriented applications and provides an in-depth survey of the media synchronization problem for the design of these applications. The paper then presents our Distributed Multimedia Teleworking System (DMTS) prototype, which allows two or more remote systems in collaboration to access and modify multimedia data through a network in a fully synchronous fashion. The system was developed over TCP/IP on an FDDI network, using an XVideo D/A card. The media supported by DMTS include text, graphics, voice, and video. For text and graphics, to maintain the coherence of the data being simultaneously modified, DMTS employs a master-slave collaboration model between the two remote systems. Moreover, DMTS also adopts effective mechanisms to reduce skew (asynchrony) and jitter delays between video and voice data streams. Finally, the paper demonstrates that DMTS achieves a maximum throughput of 13 frames per second and the throughput bottleneck resides in the hardware capture and the D/A processing of video frames

Ontology-Driven Visualization System for Semantic Search

May 2011

·

63 Reads

Technical manuals are very diverse, ranging from manuals on software to manuals on commodities, general instructions and technical manuals that deal with specific domains such as mechanical maintenance. Due to the vast amount of manual information finding the necessary information is quite difficult. In case of electronic maintenance manuals currently used by companies, mechanics should search for the related information to accomplish their tasks. And it is difficult to grasp relationships among contents in manuals. Search process is time-consuming and laborious for mechanics. Many researchers have adopted ontology to solve these problems and semantically represent contents of manuals. However if ontology becomes very large and complex, it is not easy to work with ontology. Visualization has been an effective way to grasp and manipulate ontology. In this research, we model new ontology to represent and retrieve contents of manuals and design the visualization system based on proposed ontology. To model ontology, we analyzed aircraft maintenance process, extracted concepts and defined relationships between concepts. After modeling, we created instances of each class using technical manuals. Our system visualizes related information so that mechanics can intuitively grasp the information. This allows workers to easily get information for given tasks and to reduce their time to search related information. Also, related information can be understood at a time through visualization.

Dynamic Video Playout Smoothing Method for Multimedia Applications

July 1996

·

38 Reads

Multimedia applications including video data require the smoothing of video playout to prevent potential playout discontinuity. In this paper, we propose a dynamic video playout smoothing method, called the video smoother, which dynamically adopts various playout rates in an attempt to compensate for high delay variance of networks. Specifically, if the number of frames in the buffer exceeds a given threshold (TH), the smoother employs a maximum playout rate. Otherwise, the smoother uses proportionally reduced rates in an effort to eliminate playout pauses resulting from the emptiness of the playout buffer. To determine THs under various loads, we present an analytic model assuming the Poisson process arrival correspondent with a network with a traffic shaper. Based on the analytic results, we establish a paradigm of determining THs and playout rates for achieving different playout qualities under various loads of networks. Finally, to demonstrate the viability of the video smoother, we have implemented a prototyping system including a multimedia teleconferencing application and the video smoother performing as a part of the transport layer. The prototyping results show that the video smoother achieves smooth playout incurring only unnoticeable delays


Open graphical framework for interactive TV

January 2004

·

16 Reads

Multimedia end-user terminals are expected to perform advanced user interface related tasks. These tasks are carried out by user interface runtime tools and include, among others, the visualization of complex graphics and the efficient handling of user input. In addition, the terminal's graphical system is expected, for example, to be able to synchronize audio and video, and control different contexts on the same screen. Finally, the availability of high level tools to simplify the user interface implementation and the adaptiveness of the user interfaces for a diversity of configurations are, as well, desirable features. We present a layered model that meets the just mentioned requirements. The architecture is divided into five different layers: hardware abstraction layer, multimedia cross platform libraries, graphical environment, GUI toolkit and high level languages. Moreover, a prototype system based on the architecture, targeted to digital television receivers, was implemented for the purpose of testing the validity of the concepts. In order to evaluate the prototype, some already developed MHP compliant digital television applications were tested. In addition the prototype was extended with a high level profile(i.e., SMIL support) and a low level one, (i.e., access to the frame buffer memory).

Spatial interest pixels (SIPs): Useful low-level features of visual media data

December 2003

·

221 Reads

Visual media data such as an image is the raw data representation for many important applications. The biggest challenge in using visual media data comes from the extremely high dimensionality. We present a comparative study on spatial interest pixels (SIPs), including eight-way (a novel SIP miner), Harris, and Lucas-Kanade, whose extraction is considered as an important step in reducing the dimensionality of visual media data. With extensive case studies, we have shown the usefulness of SIPs as the low-level features of visual media data. A class-preserving dimension reduction algorithm (using GSVD) is applied to further reduce the dimension of feature vectors based on SIPs. The experiments showed its superiority over PCA.

A web 2.0 archive to access, annotate and retrieve manuscripts

May 2011

·

46 Reads

The Web development encouraged different organizations and individuals to expose their multimedia documents on the internet. Additionally, the migration to web 2.0 offered users the chance to comment and annotate the contents of these multimedia documents. Museums and libraries are particularly interested in users’ feedback and work, because many collections, such as handwritten manuscripts, are still puzzles for archivists. Therefore any feedback concerning these contents will be welcomed. This article focuses on the design and the implementation of a web archive. The main objective is enabling users to annotate easily and remotely manuscript documents using web 2.0 application. User annotations are considered important to enrich the archive contents with essential information nevertheless not all users are experts in the manuscript domain. Accordingly, users need a kind of assistance during the search and annotation processes. The proposed assistant in our archive is a recommender system; it relies on registered traces of the user interaction with the documents to generate suggestions. KeywordsWeb 2.0 archive–Manuscript annotation–Collaboration–User traces–Recommender system

Table 1 Query audio transformations used in TRECVID 2008/2009.
Table 16 Comparison of averaged min NDCR across all transforms for the different 2009 audio query detection submissions.
and actual NDCR for balanced case for copy detection with fusion of energy difference and NN-based fingerprints for 2009 queries
Crim's content-based audio copy detection system for TRECVID 2009

June 2010

·

282 Reads

We report results on audio copy detection for TRECVID 2009 copy detection task. This task involves searching for transformed audio queries in over 385h of test audio. The queries were transformed in seven different ways, three of them involved mixing unrelated speech to the original query, making it a much more difficult task. We give results with two different audio fingerprints and show that mapping each test frame to the nearest query frame (nearest-neighbor fingerprint) results in robust audio copy detection. The most difficult task in TRECVID 2009 was to detect audio copies using predetermined thresholds computed from 2008 data. We show that the nearest-neighbor fingerprints were robust to even this task and gave actual minimal normalized detection cost rate (NDCR) of around 0.06 for all the transformations. These results are close to those obtained by using the optimal threshold for each transform. This result shows the robustness of the nearest-neighbor fingerprints. These nearest-neighbor fingerprints can be efficiently computed on a graphics processing unit, leading to a very fast search. KeywordsAudio copy detection-Copy detection-Energy difference-Nearest neighbor

Fig. 1 The relationship of the video motion objects ( a ), the MVs ( b ) and the selected modes( c ) 
Table 1 The coding modes for the different region
Fig. 2 Skipping the detection of the interior pixels of the motion object. a All surrounded method b Semisurrounded method
Table 4 The effects of the threshold on the PSNR, bit rate and encoding time
Table 5 Performance comparisons of the proposed algorithm and Wang et al.'s
Fast mode selection for H.264 video coding standard based on motion region classification

June 2012

·

80 Reads

H.264/AVC achieves higher compression efficiency than previous video coding standards. However, the process of selecting the optimal coding mode for each macroblock (MB) results in extremely high computation complexity, which make it difficult for practical use. In this paper, an efficient algorithm is proposed to reduce the complexity of MB mode selection. The proposed algorithm identified the interior region of the motion object by using the motion vectors information firstly. For the interior region surrounded by the identical motion vectors, we skip the mode selection of the MBs and then treat them with large block size modes directly. We also discuss the specific examples in this region. For the boundary region, we classify them into different types according to the coded mode information. After that we process the different regions with different mode set distinctly. Experimental results show that the proposed algorithm can save the encoding time up to 46% on average compared to the conventional method in the JVT JM8.6 reference encoder with only 0.12dB performance degradation. KeywordsMode selection–Fast algorithm–Motion region classification–H.264–Video coding

Implementation of MAC-based RTL module for Inverse DCT in H.264/AVC

November 2011

·

25 Reads

In this paper, we implemented the MAC-based RTL module for inverse DCT in H.264/AVC to improve applicability, to reduce processing time and utilize resources. The paper highlights design of FU architecture, its interconnection topology, regular formula of inverse DCT and array processor mapping as well as MAC-based RTL module constructing. Multi-directional FUA and FPGA were presented along with an evaluated performance and simulation result. Hence, the paper encompasses design of single FU that was verified with the performance test at maximum frequency 200 MHz; the designed 4-by-4 FUA operates over 100 MHz. The proposed multi-directional FU can be extended to n-by-n FUA that functionality can be extended to next video coding standard (H.265/HEVC). KeywordsFunctional Unit–FPGA–H.264/AVC–Inverse DCT

Evaluation of data-parallel H.264 decoding approaches for strongly resource-restricted architectures

June 2011

·

936 Reads

Decoding of an H.264 video stream is a computationally demanding multimedia application which poses serious challenges on current processor architectures. For processors with strongly limited computational resources, a natural way to tackle this problem is the use of multi-core systems. The contribution of this paper lies in a systematic overview and performance evaluation of parallel video decoding approaches. We focus on decoder splittings for strongly resource-restricted environments inherent to mobile devices. For the evaluation, we introduce a high-level methodology which can estimate the runtime behaviour of multi-core decoding architectures. We use this methodology to investigate six methods for accomplishing data-parallel splitting of an H.264 decoder. These methods are compared against each other in terms of runtime complexity, core usage, inter-communication and bus transfers. We present benchmark results using different numbers of processor cores. Our results shall aid in finding the splitting strategy that is best-suited for the targeted hardware-architecture. KeywordsVideo decoding–H.264/AVC–Multimedia–Multi-core–Embedded architectures

A genetic approach to Markovian characterisation of H.264 scalable video

May 2012

·

85 Reads

We propose an algorithm for multivariate Markovian characterisation of H.264/SVC scalable video traces at the sub-GoP (Group of Pictures) level. A genetic algorithm yields Markov models with limited state space that accurately capture temporal and inter-layer correlation. Key to our approach is the covariance-based fitness function. In comparison with the classical Expectation Maximisation algorithm, ours is capable of matching the second order statistics more accurately at the cost of less accuracy in matching the histograms of the trace. Moreover, a simulation study shows that our approach outperforms Expectation Maximisation in predicting performance of video streaming in various networking scenarios. KeywordsH.264/SVC–Traffic characterisation–Markovian arrival process

A hybrid M-channel filter bank and DCT framework for H.264/AVC intra coding

April 2010

·

33 Reads

In H.264/AVC, discrete cosine transform (DCT) is performed on the residual blocks after prediction. However, the mismatch between variable block sizes and the fixed transform matrix not only degrades decorrelation performance but also causes severe blocky artifacts inside the blocks. In previous work, M-channel filter bank system (MCFBS) was proposed to overcome these defects. However, the increased percentage of encoding time by using MCFBS is very high, especially for intra coding. More seriously, the constructed M-channel filter bank with floating-point coefficients is an obstacle to hardware implementation. In this work, a hybrid M-channel Filter bank and DCT (HMD) framework is proposed for intra coding. Besides, the integer transform of a newly constructed M-channel filter bank is also implemented for HMD. Experimental results demonstrate that HMD can reduce 64–69% of the complexity of MCFBS with negligible quality degradation. Keywords M-channel filter bank-Discrete cosine transform-Intra coding-H.264/AVC

Streaming of scalable h.264 videos over the Internet

February 2008

·

19 Reads

To investigate the benefits of scalable codecs in the case of rate adaptation problem, a streaming system for scalable H.264 videos has been implemented. The system considers congestion level in the network and buffer status at the client during adaptation process. The rate adaptation algorithm is content adaptive. It selects an appropriate substream from the video file by taking into account the motion dynamics of video. The performance of the system has been tested under congestion-free and congestion scenarios. The performance results indicate that the system reacts to congestion properly and can be used for Internet video streaming where losses occur unpredictably.

ILPS: A Scalable Multiple Description Coding Scheme for H.264

June 2009

·

25 Reads

The most recent literature indicates multiple description coding (MDC) as a promising coding approach to handle the problem of video transmission over unreliable networks with different quality and bandwidth constraints. We introduce an approach moving from the concept of spatial MDC and introducing some algorithms to obtain sub-streams that are more efficient by exploiting some form of scalability. In the algorithm, we first generate four subsequences by sub-sampling, then two of these subsequences are jointly used to form each of the two descriptions. For each description, one of the original subsequences is predicted from the other one via some scalable algorithms, focusing on the inter layer prediction scheme. The proposed algorithm has been implemented as pre- and post- processing of the standard H.264/SVC coder. The experimental results are presented and we show it provides excellent results.

XTemplate 3.0: Spatio-temporal semantics and structure reuse for hypermedia compositions

December 2012

·

203 Reads

The use of declarative languages in digital TV systems, as well as IPTV systems, facilitates the creation of interactive applications. However, when an application becomes more complex, with many user interactions, for example, the hypermedia document that describes that application becomes bigger, having many lines of XML code. Thus, specification reuse is crucial for an efficient application development process. This paper proposes the XTemplate 3.0 language, which allows the creation of NCL hypermedia composite templates. Templates define generic structures of nodes and links to be added to a document composition, providing spatio-temporal synchronization semantics to it. The use of hypermedia composite templates aims at facilitating the authoring work, allowing the reuse of hypermedia document common specifications. Using composite templates, hypermedia documents become simpler and easier to be created. The 3.0 version of XTemplate adds new facilities to the XTemplate language, such as the possibility of specifying presentation information, the attribution of values to variables and connector parameters during template processing time and the template ability to extend other templates. As an application of XTemplate, this work extends the NCL 3.0 declarative language with XTemplate, adding semantics to NCL contexts and providing document structure reuse. In addition, this paper also presents two authoring tools: the template processor and the wizard to create NCL documents using templates. The wizard tool allows the author to choose a template included in a template base and create an NCL document using that template. The template processor transforms an NCL document using templates into a standard NCL 3.0 document according to digital TV and IPTV standards. KeywordsHypermedia composite templates–Reuse–Spatio-temporal semantics–XTemplate–Interactive TV–NCL–Hypermedia authoring

Top-cited authors