Preprint

VidCEP: Complex Event Processing Framework to Detect Spatiotemporal Patterns in Video Streams

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Video data is highly expressive and has traditionally been very difficult for a machine to interpret. Querying event patterns from video streams is challenging due to its unstructured representation. Middleware systems such as Complex Event Processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Current CEP systems have inherent limitations to query video streams due to their unstructured data model and lack of expressive query language. In this work, we focus on a CEP framework where users can define high-level expressive queries over videos to detect a range of spatiotemporal event patterns. In this context, we propose: i) VidCEP, an in-memory, on the fly, near real-time complex event matching framework for video streams. The system uses a graph-based event representation for video streams which enables the detection of high-level semantic concepts from video using cascades of Deep Neural Network models, ii) a Video Event Query language (VEQL) to express high-level user queries for video streams in CEP, iii) a complex event matcher to detect spatiotemporal video event patterns by matching expressive user queries over video data. The proposed approach detects spatiotemporal video event patterns with an F-score ranging from 0.66 to 0.89. VidCEP maintains near real-time performance with an average throughput of 70 frames per second for 5 parallel videos with sub-second matching latency.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Complex Event Processing (CEP) is a paradigm to detect event patterns over streaming data in a timely manner. Presently, CEP systems have inherent limitations to detect event patterns over video streams due to their data complexity and lack of structured data model. Modelling complex events in unstructured data like videos not only requires detecting objects but also the spatiotemporal relationships among objects. This work introduces a novel video representation technique where an input video stream is converted to a stream of graphs. We propose the Video Event Knowledge Graph (VEKG), a knowledge graph driven representation of video data. VEKG models video objects as nodes and their relationship interaction as edges over time and space. It creates a semantic knowledge representation of video data derived from the detection of high-level semantic concepts from the video using an ensemble of deep learning models. To optimize the run-time system performance, we introduce a graph aggregation method VEKG-TAG, which provides an aggregated view of VEKG for a given time length. We defined a set of operators using event rules which can be used as a query and applied over VEKG graphs to discover complex video patterns. The system achieves an F-Score accuracy ranging between 0.75 to 0.86 for different patterns when queried over VEKG. In given experiments, pattern search time over VEKG-TAG was 2.3X faster as compared to the baseline.
Article
Full-text available
Event processing systems serve as a middleware between the Internet of Things (IoT) and the application layer by allowing users to subscribe to events of interest. Due to the increase of multimedia IoT devices (i.e. traffic camera), the types of events created are shifting more towards unstructured (multimedia) data. Therefore, there is a growing demand for efficient utilization of effective processing of streams of both structured events (i.e. sensors) and unstructured multimedia events (i.e. images, video, audio). However, current event processing engines have limited or no support for unstructured event types. In this paper, we described a generalized approach that can handle Internet of Multimedia Things (IoMT) events as a native event type in event processing engines with high efficiency. The proposed system extends event processing languages with the introduction of operators for multimedia analysis of unstructured events and leverages a deep convolutional neural network based event matcher for processing of image events to extract features. Furthermore, we show that neural network based object detection models can be further optimized by leveraging subscription constraints to reduce time complexity while maintaining competitive accuracy. Our initial results demonstrate the feasibility of a generalized approach towards IoMT-based event processing. Application areas for generalized event processing include traffic management, security, parking, supervision activities, and to enhance the quality of life within smart cities. OAPA
Article
Full-text available
Multimedia Sensor Networks (MSNs) have gained much attention in recent years from the emerging trends of Internet of Things (IoT). They can be found in different scenarios in our everyday life (e.g., smart homes, smart buildings). Sensors in MSNs can have different capacities, produce multiple kinds of outputs, and have different output encoding formats. Thus, detecting complex events, which requires the aggregation of several sensor readings, can be difficult due to the lack of a generic model that can describe: (i) sensor networks infrastructure, (ii) individual sensor specificities, as well as (iii) multimedia data, while allowing the alignment with the application domain knowledge. In this study, we propose Multimedia Semantic Sensor Network Ontology (MSSN-Onto) to ensure MSNs modeling and provide both syntactic and semantic data interoperability for defining and detecting events in various domains. To show the readiness of MSSN-Onto, we used it as the core ontology of a dedicated framework (briefly defined here). We also adopted MSSN-Onto in HIT2GAP European Project. A prototype has been implemented to conduct a set of tests. Experimental results show that MSSN-Onto can be used to: (i) effectively model MSNs and multimedia data; (ii) define complex events; and (iii) allow to build an efficient event querying engine for MSNs.
Conference Paper
Full-text available
IMGpedia is a large-scale linked dataset that incorporates visual information of the images from the Wikimedia Commons dataset: it brings together descriptors of the visual content of 15 million images, 450 million visual-similarity relations between those images, links to image metadata from DBpedia Commons, and links to the DBpedia resources associated with individual images. In this paper we describe the creation of the IMGpedia dataset, provide an overview of its schema and statistics of its contents, offer example queries that combine semantic and visual information of images, and discuss other envisaged use-cases for the dataset.
Article
Full-text available
Nowadays, the diversity and large deployment of video recorders result in a large volume of video data, whose effective use requires a video indexing process. However, this process generates a major problem consisting in the semantic gap between the extracted low-level features and the ground truth. The ontology paradigm provides a promising solution to overcome this problem. However, no naming syntax convention has been followed in the concept creation step, which constitutes another problem. In this paper, we have considered these two issues and have developed a full video surveillance ontology following a formal naming syntax convention and semantics that addresses queries of both academic research and industrial applications. In addition, we propose an ontology video surveillance indexing and retrieval system (OVIS) using a set of semantic web rule language (SWRL) rules that bridges the semantic gap problem. Currently, the existing indexing systems are essentially based on low-level features and the ontology paradigm is used only to support this process with representing surveillance domain. In this paper, we developed the OVIS system based on the SWRL rules and the experiments prove that our approach leads to promising results on the top video evaluation benchmarks and also shows new directions for future developments.
Conference Paper
Full-text available
Event processing systems involve the processing of high volume and variety data which has inherent uncertainties like incomplete event streams, imprecise event recognition etc. With the emergence of crowdsourcing platforms, the performance of event processing systems can be enhanced by including 'human-in-the-loop' to leverage their cognitive ability. The resulting crowd-sourced event processing can cater to the problem of event uncertainty and veracity by using humans to verify the results. This paper introduces the first hybrid crowd-enabled event processing engine. The paper proposes a list of five event crowd operators that are domain and language independent and can be used by any event processing framework. These operators encapsulate the complexities to deal with crowd workers and allow developers to define an event-crowd hybrid workflow. The operators are: Annotate, Rank, Verify, Rate, and Match. The paper presents a proof of concept of event crowd operators, schedulers, poolers, aggregators in an event processing system. The paper demonstrates the implementation of these operators and simulates the system with various performance metrics. The experimental evaluation shows that throughput of the system was 7.86 events per second with average latency of 7.16 seconds for 100 crowd workers. Finally, the paper concludes with avenues for future research in crowd-enabled event processing.
Article
Full-text available
A large number of distributed applications requires continuous and timely processing of information as it flows from the periphery to the center of the system. Examples include intrusion detection systems which analyze network traffic in real-time to identify possible attacks; environmental monitoring applications which process raw data coming from sensor networks to identify critical situations; or applications performing online analysis of stock prices to identify trends and forecast future values. Traditional DBMSs, which need to store and index data before processing it, can hardly fulfill the requirements of timeliness coming from such domains. Accordingly, during the last decade, different research communities developed a number of tools, which we collectively call Information flow processing (IFP) systems, to support these scenarios. They differ in their system architecture, data model, rule model, and rule language. In this article, we survey these systems to help researchers, who often come from different backgrounds, in understanding how the various approaches they adopt may complement each other. In particular, we propose a general, unifying model to capture the different aspects of an IFP system and use it to provide a complete and precise classification of the systems and mechanisms proposed so far.
Article
Full-text available
With the advances in information technology, the amount of multimedia data captured, produced, and stored is increasing rapidly. As a consequence, multimedia content is widely used for many applications in today’s world, and hence, a need for organizing this data, and accessing it from repositories with vast amount of information has been a driving stimulus both commercially and academically. In compliance with this inevitable trend, first image and especially later video database management systems have attracted a great deal of attention, since traditional database systems are designed to deal with alphanumeric information only, thereby not being suitable for multimedia data. In this paper, a prototype video database management system, which we call BilVideo, is introduced. The system architecture of BilVideo is original in that it provides full support for spatio-temporal queries that contain any combination of spatial, temporal, object-appearance, external-predicate, trajectory-projection, and similarity-based object-trajectory conditions by a rule-based system built on a knowledge-base, while utilizing an object-relational database to respond to semantic (keyword, event/activity, and category-based), color, shape, and texture queries. The parts of BilVideo (Fact-Extractor, Video-Annotator, its Web-based visual query interface, and its SQL-like textual query language) are presented, as well. Moreover, our query processing strategy is also briefly explained.
Conference Paper
Full-text available
Presents CVQL (Content-based Video Query Language) for video databases. Spatial and temporal relationships of content objects are used for the specification of query predicates. Queries of realism are illustrated to show the power of CVQL. Macro definitions are supported to simplify query specification. Index structures and query processing for CVQL are considered, and a prototype video database system is implemented, which consists of a GUI and a CVQL processor. Users can sketch a query and its corresponding predicate by using the GUI, and the query can then be converted to CVQL for processing
Conference Paper
Full-text available
In this paper, we propose new graph-based data structure and indexing to organize and retrieve video data. Several researches have shown that a graph can be a better candidate for modeling semantically rich and complicated multimedia data. However, there are few methods that consider the temporal feature of video data, which is a distinguishable and representative characteristic when compared with other multimedia (i.e., images). In order to consider the temporal feature effectively and efficiently, we propose a new graph-based data structure called Spatio-Temporal Region Graph (STRG). Unlike existing graph-based data structures which provide only spatial features, the proposed STRG further provides temporal features, which represent temporal relationships among spatial objects. The STRG is decomposed into its subgraphs in which redundant subgraphs are eliminated to reduce the index size and search time, because the computational complexity of graph matching (subgraph isomorphism) is NP-complete. In addition, a new distance measure, called Extended Graph Edit Distance (EGED), is introduced in both non-metric and metric spaces for matching and indexing respectively. Based on STRG and EGED, we propose a new indexing method STRG-Index, which is faster and more accurate since it uses tree structure and clustering algorithm. We compare the STRG-Index with the M-tree, which is a popular tree-based indexing method for multimedia data. The STRG-Index outperforms the M-tree for various query loads in terms of cost and speed.
Conference Paper
Full-text available
Automated visual surveillance has emerged as a trendy ap- plication domain in recent years. Many approaches have been developed on video processing and understanding. Content-based access to surveil- lance video has become a challenging research area. The results of a considerable amount of work dealing with automated access to visual surveillance have appeared in the literature. However, the event models and the content-based querying and retrieval components have significant gaps remaining unfilled. To narrow these gaps, we propose a database model for querying surveillance videos by integrating semantic and low- level features. In this paper, the initial design of the database model, the query types, and the specifications of its query language are presented.
Article
Full-text available
Well adapted to the loosely coupled nature of distributed interaction in large-scale applications, the publish/subscribe communication paradigm has recently received increasing attention. With systems based on the publish/subscribe interaction scheme, subscribers register their interest in an event, or a pattern of events, and are subsequently asynchronously notified of events generated by publishers. Many variants of the paradigm have recently been proposed, each variant being specifically adapted to some given application or network model. This paper factors out the common denominator underlying these variants: full decoupling of the communicating entities in time, space, and synchronization. We use these three decoupling dimensions to better identify commonalities and divergences with traditional interaction paradigms. The many variations on the theme of publish/subscribe are classified and synthesized. In particular, their respective benefits and shortcomings are discussed both in terms of interfaces and implementations.
Conference Paper
Full-text available
In this paper, we propose a novel query language for video indexing and retrieval that (1) enables to make queries both at the image level and at the semantic level (2) enables the users to define their own scenarios based on semantic events and (3) retrieves videos with both exact matching and similarity matching. For a query language, four main issues must be addressed: data modeling, query formulation, query parsing and query matching. In this paper we focus and give contributions on data modeling, query formulation and query matching. We are currently using color histograms and SIFT features at the image level and 10 types of events at the semantic level. We have tested the proposed query language for the retrieval of surveillance videos of a metro station. In our experiments the database contains more than 200 indexed physical objects and 48 semantic events. The results using different types of queries are promising.
Article
Full-text available
This paper describes VISUAL, a graphical icon-based query language with a user-friendly graphical user interface for scientific databases and its query processing techniques. VISUAL is suitable for domains where visualization of the relationships is important for the domain scientist to express queries. In VISUAL, graphical objects are not tied to the underlying formalism; instead, they represent the relationships of the application domain. VISUAL supports relational, nested, and object-oriented models naturally and has formal basis. For ease of understanding and for efficiency reasons, two VISUAL semantics are introduced, namely, the interpretation and execution semantics. Translations from VISUAL to the Object Query Language (for portability considerations) and to an object algebra (for query processing purposes) are presented. Concepts of external and internal queries are developed as modularization tools.
Article
Full-text available
CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We begin by presenting an abstract semantics that relies only on "black-box" mappings among streams and relations. From these mappings we define a precise and general interpretation for continuous queries. CQL is an instantiation of our abstract semantics using SQL to map from relations to relations, window specifications derived from SQL-99 to map from streams to relations, and three new operators to map from relations to streams. Most of the CQL language is operational in the STREAM system. We present the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries. Examples throughout the paper are drawn from the Linear Road benchmark recently proposed for DSMSs. We also curate a public repository of data stream applications that includes a wide variety of queries expressed in CQL. The relative ease of capturing these applications in CQL is one indicator that the language contains an appropriate set of constructs for data stream processing.
Article
Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering "after the fact" queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive and slow. We build Focus, a system for low-latency and low-cost querying on large video datasets. Focus uses cheap ingestion techniques to index the videos by the objects occurring in them. At ingest-time, it uses compression and video-specific specialization of CNNs. Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time. To reduce query time latency, we cluster similar objects and hence avoid redundant processing. Using experiments on video streams from traffic, surveillance and news channels, we see that Focus uses 58X fewer GPU cycles than running expensive ingest processors and is 37X faster than processing all the video at query time.
Conference Paper
As a bridge to connect vision and language, visual relations between objects in the form of relation triplet $łangle subject,predicate,object\rangle$, such as "person-touch-dog'' and "cat-above-sofa'', provide a more comprehensive visual content understanding beyond objects. In this paper, we propose a novel vision task named Video Visual Relation Detection (VidVRD) to perform visual relation detection in videos instead of still images (ImgVRD). As compared to still images, videos provide a more natural set of features for detecting visual relations, such as the dynamic relations like "A-follow-B'' and "A-towards-B'', and temporally changing relations like "A-chase-B'' followed by "A-hold-B''. However, VidVRD is technically more challenging than ImgVRD due to the difficulties in accurate object tracking and diverse relation appearances in video domain. To this end, we propose a VidVRD method, which consists of object tracklet proposal, short-term relation prediction and greedy relational association. Moreover, we contribute the first dataset for VidVRD evaluation, which contains 1,000 videos with manually labeled visual relations, to validate our proposed method. On this dataset, our method achieves the best performance in comparison with the state-of-the-art baselines.
Article
Recent advances in computer vision---in the form of deep neural networks---have made it possible to query increasing volumes of video data with high accuracy. However, neural network inference is computationally expensive at scale: applying a state-of-the-art object detector in real time (i.e., 30+ frames per second) to a single video requires a $4000 GPU. In response, we present NoScope, a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search. Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive. NoScope cascades two types of models: specialized models that forego the full generality of the reference model but faithfully mimic its behavior for the target video and object; and difference detectors that highlight temporal differences across frames. We show that the optimal cascade architecture differs across videos and objects, so NoScope uses an efficient cost-based optimizer to search across models and cascades. With this approach, NoScope achieves two to three order of magnitude speed-ups (265-15,500x real-time) on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1--5% of state-of-the-art neural networks.
Chapter
Spatial and temporal data is plentiful on the Web, and SemanticWeb technologies have the potential to make this data more accessible and more useful. Semantic Web researchers have consequently made progress towards better handling of spatial and temporal data.SPARQL, the W3C-recommended query language for RDF, does not adequately support complex spatial and temporal queries. In this work, we present the SPARQL-ST query language. SPARQL-ST is an extension of SPARQL for complex spatiotemporal queries. We present a formal syntax and semantics for SPARQL-ST. In addition, we describe a prototype implementation of SPARQL-ST and demonstrate the scalability of this implementation with a performance study using large real-world and synthetic RDF datasets.
Article
With the rapid increasing of video data, video queries are becoming increasingly important. To better describe users’ video query requirements, developing a functional video query language has become a promising and interesting task. In this paper, we present a novel query language called SVQL for video databases, which is developed based on an extension of the traditional database query language SQL. In SVQL, we remain the clear and concise grammatical framework of SQL, thereby making SVQL easy to learn and use for traditional users, i.e., making SVQL with a user-friendly interface. Moreover, we extend the WHERE clause of SQL to introduce new conditional expressions, such as variable declaration, structure specification, feature specification and spatial-temporal specification, thereby, making SVQL with powerful expressiveness. In this paper, we first present the formal definitions of SVQL and illustrate its basic query capabilities using examples. Then, we discuss the SVQL query processing techniques. Finally, we evaluate SVQL through the comparison with other existing video query languages, and the evaluation results demonstrate the practicality and effectiveness of our proposed query language for video databases
Article
We have described a system for reasoning about temporal intervals that is both expressive and computationally effective. The representation captures the temporal hierarchy implicit in many domains by using a hierarchy of reference intervals, which precisely control the amount of deduction performed automatically by the system. This approach is partially partially useful in domains where temporal information is imprecise and relative, and techniques such as dating are not possible. © 1990 Morgan Kaufmann Publishers, Inc. Published by Elsevier Inc. All rights reserved.
Article
The increasing need of video based applications issues the importance of parsing and organizing the content in videos. However, the accurate understanding and managing video contents at the semantic level is still insufficient. The semantic gap between low level features and high level semantics cannot be bridged by manual or semi-automatic methods. In this paper, a semantic based model named video structural description (VSD) for representing and organizing the content in videos is proposed. Video structural description aims at parsing video content into the text information, which uses spatiotemporal segmentation, feature selection, object recognition, and semantic web technology. The proposed model uses the predefined ontologies including concepts and their semantic relations to represent the contents in videos. The defined ontologies can be used to retrieve and organize videos unambiguously. In addition, besides the defined ontologies, the semantic relations between the videos are mined. The video resources are linked and organized by their related semantic relations. © 2015, Higher Education Press and Springer-Verlag Berlin Heidelberg.
Article
Event models obtained automatically from video can be used in applications ranging from abnormal event detection to content based video retrieval. When multiple agents are involved in the events, characterizing events naturally suggests encoding interactions as relations. Learning event models from this kind of relational spatio-temporal data using relational learning techniques such as Inductive Logic Programming (ILP) hold promise, but have not been successfully applied to very large datasets which result from video data. In this paper, we present a novel framework remind (Relational EventModel INDuction) for supervised relational learning of event models from large video datasets using ILP. Efficiency is achieved through the learning from interpretations setting and using a typing system that exploits the type hierarchy of objects in a domain. The use of types also helps prevent over generalization. Furthermore, we also present a type-refining operator and prove that it is optimal. The learned models can be used for recognizing events from previously unseen videos. We also present an extension to the framework by integrating an abduction step that improves the learning performance when there is noise in the input data. The experimental results on several hours of video data from two challenging real world domains (an airport domain and a physical action verbs domain) suggest that the techniques are suitable to real world scenarios.
Conference Paper
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked Media. However, the value of connecting media to its semantic meta data is limited due to lacking access methods specialized for media assets and fragments as well as to the variety of used description models. With SPARQL-MM we extend SPARQL, the standard query language for the Semantic Web with media specific concepts and functions to unify the access to Linked Media. In this paper we describe the motivation for SPARQL-MM, present the State of the Art of Linked Media description formats and Multimedia query languages, and outline the specification and implementation of the SPARQL-MM function set.
Conference Paper
In this paper, we study the problem of detecting and tracking multiple objects of various types in outdoor urban traffic scenes. This problem is especially challenging due to the large variation of road user appearances. To handle that variation, our system uses background subtraction to detect moving objects. In order to build the object tracks, an object model is built and updated through time inside a state machine using feature points and spatial information. When an occlusion occurs between multiple objects, the positions of feature points at previous observations are used to estimate the positions and sizes of the individual occluded objects. Our Urban Tracker algorithm is validated on four outdoor urban videos involving mixed traffic that includes pedestrians, cars, large vehicles, etc. Our method compares favorably to a current state of the art feature-based tracker for urban traffic scenes on pedestrians and mixed traffic.
Conference Paper
Reactive Web systems, Web services, and Web-based publish/ subscribe systems communicate events as XML messages, and in many cases require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. Emphasizing language design and formal semantics, we describe the rule-based query language XChangeEQ for detecting composite events. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as model and fixpoint theories; while this is an established approach for rule languages, it has not been applied for event queries before.
Conference Paper
We describe a general multimedia query language, called MOQL, based on ODMG's Object Query Language (OQL). In contrast to previous multimedia query languages that are either designed for one particular medium (e.g. images) or specialized for a particular application (e.g. medical imaging). MOQL is general in its treatment of multiple media and different applications. The language includes constructs to capture the temporal and spatial relationships in multimedia data as well as functions for query presentation. We illustrate the language features by query examples. The language is implemented for a multimedia database built on top of Ob jectStore.
Article
Making a database system active to meet the requirements of a wide range of applications entails developing an expressive event specification language and its implementation. Extant systems support mostly database events and in some cases a few predefined events.This paper discusses an event specification language (termed Snoop) for active databases. We define an event, distinguish between events and conditions, classify events into a class hierarchy, identify primitive events, and introduce a small number of event operators for constructing composite (or complex) events. Snoop supports temporal, explicit, and composite events in addition to the traditional database events. The novel aspect of our work lies not only in supporting a rich set of events and event expressions, but also in the notion of parameter contexts. Essentially, parameter contexts augment the semantics of composite events for computing their parameters. For concreteness, we present parameter computation for the relational model. Finally, we show how a contingency plan that includes time constraints can be supported without stepping outside of the framework proposed in this paper.
Conference Paper
In this paper, we present the design, implementation, and evalua- tion of a system that executes complex event queries over real-time streams of RFID readings encoded as events. These complex event queries filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of external monitoring applications. Stream-based execution of these queries enables time-critical actions to be taken in environ- ments such as supply chain management, surveillance and facility management, healthcare, etc. We first propose a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applica- tions. We then describe a query plan-based approach to efficiently implementing this language. Our approach uses native operators to efficiently handle query-defined sequences, which are a key com- ponent of complex event processing, and pipelines such sequences to subsequent operators that are built by leveraging relational tech- niques. We also develop a large suite of optimization techniques to address challenges such as large sliding windows and intermediate result sizes. We demonstrate the effectiveness of our approach through a detailed performance analysis of our prototype imple- mentation as well as through a comparison to a state-of-the-art stream processor.
Conference Paper
Streams of events appear increasingly today in various Web applications such as blogs, feeds, sensor data streams, geospatial information, on-line financial data, etc. Event Processing (EP) is concerned with timely detection of compound events within streams of simple events. State-of-the-art EP provides on-the-fly analysis of event streams, but cannot combine streams with background knowledge and cannot perform reasoning tasks. On the other hand, semantic tools can effectively handle background knowledge and perform reasoning thereon, but cannot deal with rapidly changing data provided by event streams. To bridge the gap, we propose Event Processing SPARQL (EP-SPARQL) as a new language for complex events and Stream Reasoning. We provide syntax and formal semantics of the language and devise an effective execution model for the proposed formalism. The execution model is grounded on logic programming, and features effective event processing and inferencing capabilities over temporal and static knowledge. We provide an open-source prototype implementation and present a set of tests to show the usefulness and effectiveness of our approach.
Conference Paper
We describe the design and implementation of the Cornell Cayuga System for scalable event processing. We present a query language based on Cayuga Algebra for naturally expressing complex event patterns. We also describe several novel system design and im- plementation issues, focusing on Cayuga's query processor, its in- dexing approach, how Cayuga handles simultaneous events, and its specialized garbage collector.
Conference Paper
Event processing will play an increasingly important role in constructing distributed applications that can immediately react to critical events. In this paper we describe the CEDR language for expressing complex event queries that filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of monitoring applications. Stream-based execution of these standing queries offers instant insight for users to see what is occurring in their systems and to take time-critical actions.
Book
Qualitativeness.- A cognitive perspective on knowledge representation.- Qualitative representation of positions in 2-D.- Reasoning with qualitative representations.- Applications.- Extensions of the basic model.- Relevant related work.- Conclusion.
Article
In this paper a novel approach for recognizing actions in video sequences is presented, where the information obtained from the segmentation and tracking algorithms is used as input data. First of all, the fuzzification of input data is done and this process allows to successfully manage the uncertainty inherent to the information obtained from low-level and medium-level vision tasks, to unify the information obtained from different vision algorithms into a homogeneous representation and to aggregate the characteristics of the analyzed scenario and the objects in motion. Another contribution is the novelty of representing actions by means of an automaton and the generation of input symbols for the finite automaton depending on the comparison process between objects and actions, i.e., the main reasoning process is based on the operation of automata with capability to manage fuzzy representations of all video data. The experiments on several real traffic video sequences demonstrate encouraging results, especially when no training algorithms to obtain predefined actions to be identified are required.
Conference Paper
Pattern matching over event streams is increasingly being employed in many areas including financial services, RFID-based inventory management, click stream analysis, and electronic health systems. While regular expression matching is well studied, pattern matching over streams presents two new challenges: Languages for pattern matching over streams are significantly richer than languages for regular expression matching. Furthermore, efficient evaluation of these pattern queries over streams requires new algorithms and optimizations: the conventional wisdom for stream query processing (i.e., using selection-join-aggregation) is inadequate. In this paper, we present a formal evaluation model that offers precise semantics for this new class of queries and a query evaluation framework permitting optimizations in a principled way. Wre further analyze the runtime complexity of query evaluation using this model and develop a suite of techniques that improve runtime efficiency by exploiting sharing in storage and processing. Our experimental results provide insights into the various factors on runtime performance and demonstrate the significant performance gains of our sharing techniques.
Article
Although events are ubiquitous in multimedia, no common notion of events has emerged. Events appear in multimedia presentation formats, programming frameworks, and databases, as well as in next-generation multimedia applications such as eChronicles, life logs, or the Event Web. A common event model for multimedia could serve as a unifying foundation for all of these applications
Article
There is now growing interest in organizing and querying large bodies of video data. In this paper, we will develop a simple SQL-like video query language which can be used not only to identify videos in the library that are of interest to the user, but which can also be used to extract, from such a video in a video library, the relevant segments of the video that satisfy the specified query condition. We investigate various types of user requests and show how they are expressed using our query language. We also develop polynomial-time algorithms to process such queries. Furthermore, we show how video-presentations may be synthesized in response to a user query. We show how a standard relational database system can be extended in order to handle queries such as those expressed in our language. Based on these principles, we have built a prototype video retrieval system called VIQS. We will describe the design and implementation of VIQS and show some sample interactions with VIQS. 1
Article
Proc. of the International Conference on Computer Vision, Corfu (Sept. 1999) An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest-neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low-residual least-squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially-occluded images with a computation time of under 2 seconds. 1.
Live Video Analytics at Scale with Approximation and Delay-Tolerance
  • H Zhang
H. Zhang et al., "Live Video Analytics at Scale with Approximation and Delay-Tolerance," in USENIX NSDI, 2017.
BlazeIt: Fast Exploratory Video Queries using Neural Networks
  • D Kang
  • P Bailis
  • M Zaharia
  • S Infolab
D. Kang, P. Bailis, M. Zaharia, and S. Infolab, "BlazeIt: Fast Exploratory Video Queries using Neural Networks," in arXiv preprint arXiv:1805.01046, 2018.
Event detection and analysis from video streams
  • G Medioni
  • I Cohen
  • F Brémond
  • S Hongeng
  • R Nevatia
G. Medioni, I. Cohen, F. Brémond, S. Hongeng, and R. Nevatia, "Event detection and analysis from video streams," IEEE TPAMI, 2001.
  • K He
  • G Gkioxari
  • P Dollar
  • R Girshick
K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," in ICCV, 2017.
TESLA: a formally defined event specification language
  • G Cugola
  • A Margara
G. Cugola and A. Margara, "TESLA: a formally defined event specification language," ACM DEBS, 2010.
  • O Etzion
  • P Niblett
O. Etzion and P. Niblett, Event Processing in Action, no. ISBN: 9781935182214. 2010.
Evaluating Video Search, Video Event Detection, Localization and Hyperlinking
  • G Awad
G. Awad gawad et al., "TRECVID 2016. Evaluating Video Search, Video Event Detection, Localization and Hyperlinking," 2016.