[Show abstract][Hide abstract] ABSTRACT: Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this paper, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The algorithms we develop identify top-k matching results satisfying both patterns and additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of distributed stream statistics. Experiments show significant improvements over existing approaches.
[Show abstract][Hide abstract] ABSTRACT: Complex media fusion operations can be costly in terms of the time they need to process input objects. If data arrive faster to fusion nodes than the speed with which they can consume the inputs, this will result in some input objects not being processed. In this paper, we develop load shedding mechanisms which take into consideration both data quality and expensive nature of media fusion operators. In particular, we present quality assessment models for objects and multistream fusion operators and highlight that such quality assessments may impose partial orders on objects. We highlight that the most effective load control approach for fusion operators involves shedding of (not the individual input objects but) combinations of objects. Yet, identifying suitable combinations of objects in real time will not be possible if efficient combination selection algorithms do not exist. We develop efficient combination selection schemes for scenarios with different quality assessment and target characteristics. We first develop efficient combination-based load shedding when the fusion operator has unambiguously monotone semantics. We then extend this to the more general ambiguously monotone case and present experimental results that show the performance gains using quality-aware combination-based load shedding strategies under the various fusion scenarios.
No preview · Article · Feb 2010 · IEEE Transactions on Knowledge and Data Engineering
[Show abstract][Hide abstract] ABSTRACT: In most existing works on shape contour matching, the shape contours are considered and matched in whole. When searching for contour snippets, however, techniques that match whole contours are not directly applicable. In particular, a relevant snippet can be anywhere on a shape contour; moreover, the relevance of shape snippet is a function of not only the shape of the snippet itself, but also of its neighborhood on the contour. In this paper, we propose an HMM based solution to shape snippet extraction. Relying on a general-purpose symbolic representation (such as SAX), we first convert the shape contour onto a representation suitable for snippet marking and extraction processes. We then show that, given a set of samples, we can train an HMM capable of detecting relevant snippets in new shape images. Next, we show that the HMM performance can be boosted significantly if the similarities between the symbolic representations are used to create new sibling training sequences from the input sequences. The experiment results show that just adding one additional sibling per input training sequence can improve the diversity of the training set sufficiently to boost the overlaps between actual and detected snippets much. While a naive application of this metadata driven training technique can increase the training costs significantly, we show that a novel metadata-driven HMM (mHMM) scheme can significantly improve the HMM-base snippet detection performance with negligible costs.
[Show abstract][Hide abstract] ABSTRACT: Most of the Web interfaces are primarily designed for people with sight, with visually rich features that makes effective use of the tools to enhance visual usability but in process making it impossible for users who are blind or visually impaired to use them. In this work, our goal is to improve participation to NSF's National Science Digital Library (NSDL) by teachers, librarians, and learners who are blind. The middleware for accessible information spaces on NSDL (MAISON) is enhancing the accessibility of NSDL, its internal and external resources and existing services (such as strand maps of educational benchmarks). Relying on cutting-edge, context-aware graph segmentation, filtering and summarization, and concept propagation techniques, the middleware provides information space adaptation, reduction, and preview services through open Web-based service APIs to enable implementation of informative navigation interfaces that are able to reduce the complexity of the information space and provide previews to prevent user disorientation.