Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Massive radio frequency identification (RFID) data sets are expected to become commonplace in supply chain management systems. Warehousing and mining this data is an essential problem with great potential benefits for inventory management, object tracking, and product procurement processes. Since RFID tags can be used to identify each individual item, enormous amounts of location-tracking data are generated. With such data, object movements can be modeled by movement graphs, where nodes correspond to locations and edges record the history of item transitions between locations. In this study, we develop a movement graph model as a compact representation of RFID data sets. Since spatiotemporal as well as item information can be associated with the objects in such a model, the movement graph can be huge, complex, and multidimensional in nature. We show that such a graph can be better organized around gateway nodes, which serve as bridges connecting different regions of the movement graph. A graph-based object movement cube can be constructed by merging and collapsing nodes and edges according to an application-oriented topological structure. Moreover, we propose an efficient cubing algorithm that performs simultaneous aggregation of both spatiotemporal and item dimensions on a partitioned movement graph, guided by such a topological structure.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On the other hand, some studies are limited to the modeling part of the tracking data, overlooking the potential impact on decision making (e.g. Gonzalez et al., 2010;Kang and Yong 2010;Zhong et al. 2015). Further, other relevant studies focus on privacy, technical, and data quality issues of tracking systems (Bardaki et al., 2011;Lee and Kwon, 2015;Xu et al., 2010). ...
... These moving objects could be shoppers moving in a store, visitors moving in a museum, medical patients in hospitals, products moving in a fashion store, delivery vehicles and public transit buses, and so on (Chang et al, 2014). At the same time, due to this popularization of the tracking equipment, such as GPS receivers, RFID and BLE tracking devices, a vast amount of moving object data originating in supply chain operations, road network monitoring, geo-positioning, and other RFID applications, has been gathered (Civilis et al., 2005;Giannotti et., al, 2007;Gonzalez et al., 2010). These data gathered by sensor-based devices are opening up exciting new steams of innovative applications (Chen and Storey, 2012). ...
... Nonetheless, they purely focus on the algorithmic part in order to extract these common patterns, setting aside how this modeling could be utilized for decision-making purposes. The most relevant work is that of Gonzalez et al. (2010) that proposes a graph-based object movement cube, and an algorithm that performs simultaneous aggregation of both spatiotemporal and item dimensions on a movement graph. However, apart from the modeling of the RFID events into graph-based cubes, this paper focuses mainly on technical issues and overlooks the decision-making part. ...
Conference Paper
Full-text available
1. Recent technological developments have facilitated the continuous identification and tracking of individual objects/ things moving in space. Business analytics tools can handle the resulting vast amount of object tracking data. Thus, tracking technologies could be viewed as information facili-tators that can directly improve decision-making. This research suggests a data-driven approach that transforms the simple object movement events captured by tracking devices in a monitored area into objects’ flows composing a network. In addition, we devise two new metrics, the volume and the mobility of the moving objects per flow to characterize the objects movement patterns. The proposed approach offers a structured way to transform massive tracking data into valuable, new knowledge of the moving behavior of objects that can support a wealth of business decisions. We demonstrate the utility of the proposed approach with real Radio Frequency Identification (RFID) data representing garments’ movements in a retail store of a fashion retailer.
... Third, most path-oriented queries are inefficient because the queries perform multiself-joining of the table involving many related positions under traditional data model [3,4]. Some works [25][26][27] compress certain RFID data, and path-oriented queries can efficiently obtain historical path information of object movements. However, the methods are ineffective because object moving may be a cyclic or long path in supply chains, and this further complicates path-oriented queries. ...
... Gonzalez et al. [25] focus on group of objects and use compression to preserve object transition relationships for reducing the join cost of processing path selection queries. Their work in [26] extends the idea to account for a more realistic object movement model and further develops a gatewaybased movement graph model as a compact representation of RFID data sets. Lee and Chung [27] focus on individual objects and propose a movement path of an EPC-tagged object, which is coded as the position of readers and the order of positions denoted a series of unique prime number pairs. ...
... According to features of the supply chain, EPC-tagged objects may be transferred by grouping or single. Gonzalez et al. [25,26] only focus on group of objects and use compression to preserve object transition relationships as a compact representation of RFID data sets. Lee and Chung [27] only focus on single objects and propose a movement path of an EPC-tagged object, which is coded as a series of unique prime number pairs (Pin, Pout). ...
Article
Full-text available
Radio Frequency Identification (RFID) is widely used to track and trace objects in traceability supply chains. However, massive uncertain data produced by RFID readers are not effective and efficient to be used in RFID application systems. Following the analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. We adjust different smoothing windows according to different rates of uncertain data, employ different strategies to process uncertain readings, and distinguish ghost, missing, and incomplete data according to their apparent positions. We propose a comprehensive data model which is suitable for different application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequence, the position, and the time intervals. The scheme is suitable for cyclic or long paths. Moreover, we further propose a processing algorithm for group and independent objects. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries.
... However, obtaining real data sets of trajectories of RFID tagged objects moving through, for example, supply chains is very difficult, i.e. these data are generally kept by private companies that are quite reluctant to share them. Hence, the use of synthetic data obtained by means of simulation is a common practice [92] [93] [94]. However, a synthetic data set might fall short of capturing the real complexity of the motion of objects. ...
... A synthetic data set generated by simulating the movement of tagged objects in supply chains. This data set has been generated by using techniques proposed in previous articles [92] [93] [94], which deal with moving objects in supply chains. ...
... Upon reception of a set of items by a distribution centre, these items are processed according to the distribution centre policy. Like in previous models [92] [93] [94], the distribution centre policy is defined by a graph. Locations where items arrive and depart are the nodes of the graph, whilst the edges represent the possibility of moving between locations. ...
Article
Full-text available
Radio Frequency Identification (RFID) is a technology aimed at eficiently identifying and tracking goods and assets. Such identification may be performed without requiring line-of-sight alignment or physical contact between the RFID tag and the RFID reader, whilst tracking is naturally achieved due to the short interrogation field of RFID readers. That is why the reduction in price of the RFID tags has been accompanied with an increasing attention paid to this technology. However, since tags are resource-constrained devices sending identification data wirelessly, designing secure and private RFID identification protocols is a challenging task. This scenario is even more complex when scalability must be met by those protocols. Assuming the existence of a lightweight, secure, private and scalable RFID identification protocol, there exist other concerns surrounding the RFID technology. Some of them arise from the technology itself, such as distance checking, but others are related to the potential of RFID systems to gather huge amount of tracking data. Publishing and mining such moving objects data is essential to improve efficiency of supervisory control, assets management and localisation, transportation, etc. However, obvious privacy threats arise if an individual can be linked with some of those published trajectories. The present dissertation contributes to the design of algorithms and protocols aimed at dealing with the issues explained above. First, we propose a set of protocols and heuristics based on a distributed architecture that improve the e?ciency of the identification process without compromising privacy or security. Moreover, we present a novel distance-bounding protocol based on graphs that is extremely low-resource consuming. Finally, we present two trajectory anonymisation methods aimed at preserving the individuals' privacy when their trajectories are released.
... With the development of RFID technology, more and more research on RFID data management has been done recently, such as RFID data warehousing and duplicate elimination [8][9][10], RFID data querying [7,[11][12][13], RFID data cleaning [14][15][16], and so on. In this section, we will review the existing RFID data compression and processing approaches that are related to our work. ...
... However, the approach has its limitation that all filtering process is aimed at the raw data at reader level rather than the path records of tags. Gonzalez et al. [9] proposed a movement graph model as a compact representation of RFID data sets. It provides a clean and concise representation of large RFID data sets. ...
... The topology similar to Figure 1 is very common in the RFID technology based applications, such as supply chain management, logistics management, and so on. [8,9] → H [11,12] → I [14,16] Tag 2: C[3, 4] → Q [5,7] → O [8,9] → H [11,12] → J [13,15] Tag 3: D[1, 3] → Q [5,7] → O [8,9] → H [11,12] → J [13,15] Tag 4: A[2, 3] → B [5,6] → O [8,9] → H [11,12] → J [13,15] [8,9] → H [11,12] → J [13,15] Tag 6: C [3,4] → Q [5,7] → O [8,9] → K [10,11] → M [12,14] Tag 7: D[1, 3] → Q [5,7] → O [8,9] → H [11,12] → I [14,16] Tag 8: A[2, 3] → B [5,6] → O [8,9] → K [10,11] → N [13,15] Tag 9: D[1, 3] → Q [5,7] → O [8,9] → K [10,11] → N [13,15] Tag10: A[2, 3] → B [5,6] → O [8,9] → K [10,11] → M [12,14] Tag11: C [3,4] → Q [5,7] → O [8,9] → K [10,11] → N [13,15] Tag12: F [3,6] → O [8,9] → H [11,12] → I [14,16] Tag13: D[1, 3] → Q [5,7] → O [8,9] → K [10,11] → M [12,14] Tag14: F [3,6] → O [8,9] → K [10,11]→ N [13,15] Tag15: C [3,4] → Q [5,7] → O [8,9] → H [11,12] → I [14,16] Tag16: F [3,6] → O [8,9] → K [10,11] → M [12,14] Tag17: D[1, 3] → Q [5,7] → O [8,9] → K [10,11] → N [14,16] Tag18: C[1, 2] → Q [4,5] → O [8,9] → H [11,12] → I [14,16] ...
Article
Full-text available
In modern supply chain management systems, Radio Frequency IDentification (RFID) technology has become an indispensable sensor technology and massive RFID data sets are expected to become commonplace. More and more space and time are needed to store and process such huge amounts of RFID data, and there is an increasing realization that the existing approaches cannot satisfy the requirements of RFID data management. In this paper, we present a split-path schema-based RFID data storage model. With a data separation mechanism, the massive RFID data produced in supply chain management systems can be stored and processed more efficiently. Then a tree structure-based path splitting approach is proposed to intelligently and automatically split the movement paths of products . Furthermore, based on the proposed new storage model, we design the relational schema to store the path information and time information of tags, and some typical query templates and SQL statements are defined. Finally, we conduct various experiments to measure the effect and performance of our model and demonstrate that it performs significantly better than the baseline approach in both the data expression and path-oriented RFID data query performance.
... Driven by the numerous potential application benefits and research challenges, RFID traceability networks are becoming an active research and development area [25,46,53,57,58]. Many researchers are currently engaged in developing solutions to address these challenges. ...
... Due to the nature of large-scale traceability applications (e.g., high volume of data, distributed across organizations, complicated relationship such as containment), data models must be appropriately designed. Fortunately, database research community is recently developing a strong interest in RFID data modeling [2,11,13,25,26,33,39,40,56,57,61]. In this section, we will examine a set of representative research work on data modeling and corresponding query processing techniques for RFID traceability networks. ...
... However, the storage used by RFID-Cuboid is more than that by DRER because of the additional tables. This additional storage cost is further reduced by [25]. This enhanced work makes an assumption that there are some "gateway" nodes in an RFID network, which have either high fan-in or high fan-out edges as illustrated in Fig. 12b. ...
Article
Full-text available
The emergence of radio frequency identification (RFID) technology brings significant social and economic benefits. As a non line of sight technology, RFID provides an effective way to record movements of objects within a networked RFID system formed by a set of distributed and collaborating parties. A trail of such recorded movements is the foundation for enabling traceability applications. While traceability is a critical aspect of majority of RFID applications, realizing traceability for these applications brings many fundamental research and development issues. In this paper, we assess the requirements for developing traceability applications that use networked RFID technology at their core. We propose a set of criteria for analyzing and comparing the current existing techniques including system architectures and data models. We also outline some research opportunities in the design and development of traceability applications.
... 1) The gateway where the RFID reader is located is an in-out-gateway [52]. ...
... Such information typically includes entity type, entity affliation, physical attributes, and assigned attributes. In a logistics context, examples of these attributes are product type, manufacturer, weight, and price [52]. The additional information may also be collected through various sensors (e.g. ...
Article
Full-text available
A schedule-based system is a system that operates on or contains within a schedule of events and breaks at particular time intervals. Entities within the system show presence or absence in these events by entering or exiting the locations of the events. Given radio frequency identification (RFID) data from a schedule-based system, what can we learn about the system (the events and entities) through data mining? Which data mining methods can be applied so that one can obtain rich actionable insights regarding the system and the domain? The research goal of this paper is to answer these posed research questions, through the development of a framework that systematically produces actionable insights for a given schedule-based system. We show that through integrating appropriate data mining methodologies as a unified framework, one can obtain many insights from even a very simple RFID dataset, which contains only very few fields. The developed framework is general, and is applicable to any schedule-based system, as long as it operates under certain basic assumptions. The types of insights are also general, and are formulated in this paper in the most abstract way. The applicability of the developed framework is illustrated through a case study, where real world data from a schedule-based system is analyzed using the introduced framework. Insights obtained include the profiling of entities and events, the interactions between entity and events, and the relations between events.
... These readings were actually unnecessary because it contains only the same information [3]. In the field of supply chain, the important readings are only the first and the last time the items were detected in the reader's vicinity [4]. Otherwise, there will be a lot of duplicate readings even in a very short time because of the capability of the RFID readers. ...
... Otherwise, there will be a lot of duplicate readings even in a very short time because of the capability of the RFID readers. Duplicate readings can slow the system's query performance [4] because the pile of data getting higher over the time. For example a small supermarket that has 10,000 tagged items will return 10,000 tuples for each reading cycle. ...
Article
Full-text available
Radio Frequency Identification (RFID) is widely used to track and trace objects in supply chain management. However, massive uncertain data produced by RFID readers are not suitable for directly use in RFID applications. This is due to repetitive readings which are unnecessary because it contains only the same information. Thus, an approach to remove repetitive readings in supply chain is paramount to minimize massive data storage that could affect query performances. We propose a simplified approach, which is suitable for a wide range of application scenarios. Experimental evaluations show that our approach is effective and efficient in terms of the removing duplicate readings and compressing the massive data significantly.
... It can best be defined as "fitness for use". [18] Recently the Institute of Medicine shocked the public with a report that 98,000 people die every year due to medical errors [8]. Some of the errors are the result of missing or bad information about drugs, orders and treatments. ...
... False positive refers to that a tag is not present but captured. Besides RFID tags to be read, additional unexpected reading are generated [18]. Duplicate readings refers to tags are in the scope of a reader for a long time and are read by the reader multiple times. ...
Article
Full-text available
Radio frequency identification (RFID) technology has seen increasing adoption rates in applications that range from supply chain management, asset tracking, Medical/Health Care applications, People tracking, Manufacturing, Retail, Warehouses, and Livestock Timing. This technology is used in many applications for data collection. The data captured by RFID readers are usually of low quality and may contain many anomalies. Data quality has become increasingly important to many organizations. This is especially true in the Medical/health care field because minute errors in it can cost heavy financial and personal losses. In order to provide reliable data to RFID application it is necessary to clean the collected data. SMURF is a declarative and adaptive smoothing cleaning technique for unreliable RFID data. However it does not work well when tag moves rapidly in and out of reader's communication range. The errors need to be cleansed in an effective manner before they are subjected to warehousing. Factors such as inter tag distance, tag-antenna distance, number of tags in the read range of antenna, reader communication range, velocity of tag movement affect the data cleaning result. Our proposed algorithm considers these factors and also the missing tag information, tags that are mistakenly read as present dynamically in determination of the size of slide window. Simulation shows our cleansing approach deals with RFID data more accurately and efficiently. Thus with the aid of the planned data cleaning technique we can bring down the health care costs, optimize business processes, streamline patient identification processes and improve patient safety.
... Note that, generally, readers are considered as simple relays that forward identification information to back-ends. 6 Two readers are said to be neighbours if their cover areas are not disjoint. ...
... R asks R i prev to remove T i information from its cache; 6: ...
Article
Radio frequency identification (RFID) is a technology aimed at efficiently identifying products that has greatly influenced the manufacturing businesses in recent years. Although the RFID technology has been widely accepted by the manufacturing and retailing sectors, there are still many issues regarding its scalability, security and privacy.With regard to privacy, the sharing of identification information amongst multiple parties is also an issue (especially after the massive outsourcing that is taking place in our global market). Securely and efficiently sharing identification information with multiple parties is a tough problem that must be considered so as to avert the undesired disclosure of confidential information. Specially in the context of supply chain management.In this article, we propose a private and scalable protocol for RFID collaborative readers to securely identify RFID tags. We define the general concepts of “next reader predictor” (NRP) and “previous reader predictor” (PRP) used by the readers to predict the trajectories of tags and collaborate efficiently. We also propose a specific Markov-based predictor implementation. By the very nature of our distributed protocol, the collaborative readers can naturally help in mitigating the problem of sharing identification information amongst multiple parties securely, which is essential in the context of supply chain management. The experimental results show that our proposal outperforms previous approaches.
... Thiesse et al. (2009) gave fundamental concepts and applications of a system based on Electronic Product Code that facilitates data exchange within supply chains, and thus enables data mining application. Gonzalez et al. (2010) addressed the massive data sets generated by RFID within supply chains, and provided a movement graph model to compactly represent RFID data sets. Lee et al. (2010) proposed a knowledge discovery system to support customer relationship management based on point-of-sales and historical data as well as RFID data. ...
Article
Full-text available
The use of data mining in supply chains is growing, and covers almost all aspects of supply chain management. A framework of supply chain analytics is used to classify data mining publications reported in supply chain management academic literature. Scholarly articles were identified using SCOPUS and EBSCO Business search engines. Articles were classified by supply chain function. Additional papers reflecting technology, to include RFID use and text analysis were separately reviewed. The paper concludes with discussion of potential research issues and outlook for future development.
... RFID tags can be used to identify each individual item and enormous amounts of locationtracking data are generated. [8] Time-dependency, dynamic changes, huge quantities and large numbers of implicit semantics are some of the characteristics of the RFID data. Data cleaning is necessary for improving the quality of data so that it becomes "fit for use" by users. ...
Article
Full-text available
Radio Frequency Identification (RFID) is a convenient technology employed in various applications. The success of these RFID applications depends heavily on the quality of the data stream generated by RFID readers. Due to various anomalies found predominantly in RFID data it limits the widespread adoption of this technology. Our work is to eliminate the anomalies present in RFID data in an effective manner so that it can be applied for high end applications. Our approach is a hybrid approach of middleware and deferred because it is not always possible to remove all anomalies and redundancies in middleware. The processing of other anomalies is deferred until the query time and cleaned by business rules. Experimental results show that the proposed approach performs the cleaning in an effective manner compared to the existing approaches.
... Hector Gonzalez, etc. [14] put forward a new gateway graph model which can be used to store a large number of RFID data related to transportation. The model has a very good effect on the large amounts of data produced by the moving objects. ...
... RFID tags can be used to identify each individual item and enormous amounts of locationtracking data are generated. [8] Time-dependency, dynamic changes, huge quantities and large numbers of implicit semantics are some of the characteristics of the RFID data. Data cleaning is necessary for improving the quality of data so that it becomes "fit for use" by users. ...
Article
Full-text available
Radio Frequency Identification (RFID) is a convenient technology employed in various applications. Thesuccess of these RFID applications depends heavily on the quality of the data stream generated by RFIDreaders. Due to various anomalies found predominantly in RFID data it limits the widespread adoption ofthis technology. Our work is to eliminate the anomalies present in RFID data in an effective manner so thatit can be applied for high end applications. Our approach is a hybrid approach of middleware anddeferred because it is not always possible to remove all anomalies and redundancies in middleware. Theprocessing of other anomalies is deferred until the query time and cleaned by business rules. Experimentalresults show that the proposed approach performs the cleaning in an effective manner compared to theexisting approaches.
... EPC1: A [2,3] → B[5] → E[16,18] → F[20] → H[22] → K[27] EPC2: A[2,3] → B[5] → E[16,18] → F[20] → H[22] → I[24,28] →L[33] EPC3: A [2,3] → C[7] → E[13] → F[16] → H[23] → J[25] →L[34] EPC4: A [2,3] → C[7] EPC5: A [2,3] → D[8,11] → F[18] → H[23] → I[26] (c) ...
Article
Full-text available
Background: RFID technology is being adopted by a number of applications including supply chain management. Due to the adoption of RFID technology challenging problems have appeared. The RFID database system handles huge amounts of path oriented, time dependent data. Methods: This paper focuses on minimizing the query processing time by using Taguchi optimization method. Simulations show how the factors considered impact RFID data processing efficiency. Findings: So the effectiveness of the system depends on the factors such as the number of tags accessed at a time, the data preprocessing techniques adopted, and selection of suitable indexing techniques and so on. However, few of the techniques adopted by existing methods focus on the efficiency of RFID data processing. Applications/Improvements: Taguchi method is applied to predict the best design combinations that achieve minimum query processing time.
... Thiesse et al. (2009) gave fundamental concepts and applications of a system based on Electronic Product Code that facilitates data exchange within supply chains, and thus enables data mining application. Gonzalez et al. (2010) addressed the massive data sets generated by RFID within supply chains, and provided a movement graph model to compactly represent RFID data sets. Lee et al. (2010) proposed a knowledge discovery system to support customer relationship management based on point-of-sales and historical data as well as RFID data. ...
Article
The use of data mining in supply chains is growing, and covers almost all aspects of supply chain management. A framework of supply chain analytics is used to classify data mining publications reported in supply chain management academic literature. Scholarly articles were identified using SCOPUS and EBSCO Business search engines. Articles were classified by supply chain function. Additional papers reflecting technology, to include RFID use and text analysis were separately reviewed. The paper concludes with discussion of potential research issues and outlook for future development.
... Raw RFID path data has the form (EPC, Loc, Time_stamp), where EPC is the Electronic Product Code of the tag that uniquely represent a shopping cart, Loc is the identification location whose reader finds the tag, and Time_stamp is the time when the RFID reading takes place [36]. These raw data firstly need to be sorted on EPC and time, and then be transformed to the form of stay record, i.e., (EPC, Loc, T_in, T_out), where T_in is the time when the RFID tag enters the identification area, and T_out is the leave time [37]. When an item (i.e., iitem) is put into a shopping cart (i.e., Cart), a raw purchasing data (i.e., (Cart, iitem, Time_stamp)) is also generated, where Time_stamp is the time of detecting the item. ...
Article
Full-text available
With the quick development of RFID technology and the decreasing prices of RFID devices, RFID is becoming widely used in various intelligent services. Especially in the retail application domain, RFID is increasingly adopted to capture the shopping tracks and behavior of in-store customers. To further enhance the potential of this promising application, in this paper, we propose a unified framework for RFID-based path analytics, which uses both in-store shopping paths and RFID-based purchasing data to mine actionable navigation patterns. Four modules of this framework are discussed, which are: (1) mapping from the physical space to the cyber space, (2) data preprocessing, (3) pattern mining and (4) knowledge understanding and utilization. In the data preprocessing module, the critical problem of how to capture the mainstream shopping path sequences while wiping out unnecessary redundant and repeated details is addressed in detail. To solve this problem, two types of redundant patterns, i.e., loop repeat pattern and palindrome-contained pattern are recognized and the corresponding processing algorithms are proposed. The experimental results show that the redundant pattern filtering functions are effective and scalable. Overall, this work builds a bridge between indoor positioning and advanced data mining technologies, and provides a feasible way to study customers' shopping behaviors via multi-source RFID data.
... According to the Fig. 1 there is no centre point where all the products are converged. So this method is best suited for graphs which can be partitioned [14]. ...
Article
Full-text available
Radio Frequency Identification (RFID) technology is very useful in broad range of areas. In supply chain management, the large volume of data to be processed. The RFID tags can be attached to each product to find the movements of the product. In this paper, we discuss various techniques to manage RFID data for supply chain management. By using path encoding methods we can process the RFID data efficiently. First we analyze the various path encoding schemes, observe the performance and compare the performance of this various encoding schemes. Each encoding scheme has its own advantages and disadvantages like poor performance, cycles in the path, long path schema. In this paper we explored various path encoding schemes to process the RFID data in efficient manner.
... First of all, RFID data are generated when readers detect the tags in a certain range of area. During the detection, the corresponding data will be captured from the tags such as an EPC (Electronic Product Code) and corresponding information related to a specific operation [4]. Such information will be processed in a database or data warehouse driven by the RFID events. ...
Conference Paper
Full-text available
Radio Frequency Identification (RFID) technology has been widely used in manufacturing sites for supporting the shopfloor management. Huge amount of RFID-enabled production data has been generated. In order to discover invaluable information and knowledge from the RFID big data, it is necessary to cleanse such dataset since there is large number of noises. This paper uses n-dimensional RFID-Cuboids to establish the data warehouse. A big data cleansing approach is proposed to detect, remove and tidy the RFID-Cuboids so that the reliability and quality of dataset could be ensured before knowledge discovery. Experiments and discussions are carried out for validating the proposed approach. It is observed that the proposed big data cleansing approach outperforms other methods like statistics analysis in terms of finding incomplete and missing cuboids.
... With the great ability of identifying each object by a unique Electronic Product Code (EPC), Radio Frequency Identification (RFID) technology enables capturing large volumes of data at high speed and can be used for identifying, locating, tracking and monitoring physical objects without line of sight [1]. The potential of RFID for increasing supply chain efficiency has been stressed repeatedly by practitioners and researchers alike. ...
Article
The applications of Radio Frequency Identification (RFID) and Electronic Product Code (EPC) in supply chain management have vast potential in improving effectiveness and efficiencies in solving supply chain problems. RFID data, however, have their own unique characteristics – including aggregation, location, temporal and history-oriented – which have to be fully considered and integrated into the RFID data model which is constructed for RFID application system. In this paper, we use an expressive temporal data model for RFID data. This data model is based on Entity-Relation (ER) model with minimum extension and highlights the state history and temporal semantics of the RFID system business processes. We also show common RFID data tracking and monitoring types, and propose methods to express such queries based on this data model.
... In addition to inventory and supply chain management the micro-chip equipped RFID tag that has a unique identification code can be employed in diverse fields like payment, entrance system, anti-counterfeit bank note, home network and more (Lee and Kim, 2007). Huge datasets possessing rich multidimensional information on the movement patterns related to objects and their features are generated by the increasingly wide adoption of RFID technology by retailers to track containers, pallets, and even individual items as they progress through the global supply chain, from production units in exporting countries, to stores in importing countries via transportation ports (Gonzalez et al., 2010;Srinivas et al., 2009). ...
Article
Full-text available
A fundamental problem with huge potential advantages for object tracking, product procurement processes and customer movement is the storage and extraction of information from RFID datasets. In this paper, we have designed an efficient technique for tracking the customers' walking path sequences using RFID equipped data. The frequent walking path sequences of the customers' movement have been extracted by exposing the most visited areas and walks across the warehouse and the typical products selected along the way. We make use of synthetic RFID datasets to experiment the proposed technique. From the analysis, we showed that the run time and memory usage is outperformed likely by 50% than the previous method. The applications such as analysing the sequential behaviour in telecommunication, market basket analysis, medical data analysis and electronic government used the sequential pattern mining methods.
... RFID tags can be used to identify each individual item and enormous amounts of location-tracking data are generated. [8] Time-dependency, dynamic changes, huge quantities and large numbers of implicit semantics are some of the characteristics of the RFID data. Data cleaning is necessary for improving the quality of data so that it becomes "fit for use" by users. ...
Conference Paper
Full-text available
RFID is a recent technology that has been widely used in education, supply chain management, military, airline, library, security, healthcare, animal farms and other areas. RFID tags store unique identification information of objects and communicate with the reader. The data observed by the reader are very dirty and unreliable. Our major aim was to design an RFID based application that would provide efficient means to perform essential information management for health care domain and to use the error free data for high-end applications. Small mistakes in healthcare could cause huge loss of life and incur massive financial losses. Data quality has become increasingly important to many organizations specifically for healthcare. So we have simulated the RFID in healthcare and next as a challenge took the dirty RFID data to perform cleaning in an effective manner using our newly proposed algorithm. Our approach is a hybrid approach of middleware and deferred because it is not always possible to remove all anomalies and redundancies in middleware. The processing of other anomalies is deferred until the query time and cleaned by business rules. Experimental results show that the proposed approach performs the cleaning in an effective manner compared to the existing approaches.
... In order to deal with RFID data, and to mine the valuable information from RFID data effectively, many research studies have recently been conducted [13,14]. We can categorize RFID data mining as follows [10]: RFID data cleaning by data mining [15], RFID data flow analysis, path-based classification and cluster analysis [16], frequent pattern and sequential pattern analysis [17], and outlier analysis in RFID data [18]. The author in [10] presented the Path Tree in RFID Cuboid for RFID data analysis and data mining. ...
Article
Recently, there have been numerous efforts to fuse the latest Radio Frequency Identification (RFID) technology with the Enterprise Information System (EIS). However, in most cases these attempts are centered mainly on the simultaneous multiple reading capability of RFID technology, and thus neglect the management of massive data generated from the RFID reader. As a result, it is difficult to ob-tain flow information for RFID data mining related to real time process control. In this paper, we propose an advanced process management method, called ‘Pro-cedure Tree’ (PT), for RFID data mining. Using the suggested PT, we are able to manage massive RFID data effectively, and perform real time process manage-ment efficiently. Then we evaluate the efficiency of the proposed method, after applying it to a real time process control system connected to the RFID-based EIS. For the verification of the suggested system, we collect an enormous amount of data in the Enterprise Resource Planning (ERP) database, analyze characteris-tics of the collected data, and then compute the elapsed time on each stage in pro-cess control. The suggested system was able to perform what the traditional RFID-based process control systems failed to do, such as predicting and tracking of real time process and inventory control, etc.
... barcode and RFID) are limited from the literature. Some typical cases were mining the RFID sets in object tracking and supply chain management, barcodes data to trace the movement in retailer sector as well as RFID data from transportation to observe the vehicle transition information (Han et al. 2006;Gonzalez et al. 2010;Han et al. 2010;Kochar and Chhillar 2011;Ding et al. 2012). ...
Article
Full-text available
Radio frequency identification (RFID) has been widely used in manufacturing field and created a ubiquitous production environment, where advanced production planning and scheduling (APS) might be enabled. Within such environment, APS usually requires standard operation times (SOTs) and dispatching rules which have been obtained from time studies or based on past experiences. Wide variations exist and frequently cause serious discrepancies in executing plans and schedules. This paper proposes a data mining approach to estimate realistic SOTs and unknown dispatching rules from RFID-enabled shopfloor production data. The approach is evaluated by real-world data from a collaborative company which has been used RFID technology for supporting its shopfloor production over seven years. The key impact factors on SOTs are quantitatively examined. A reference table with the mined precise and practical SOTs is established for typical operations and suitable dispatching rules are labled as managerial implicities, aiming at improving the quality and stability of production plans and schedules.
... The digital visibility provided by RFID can be computationally blinding if such massive data sets are not managed properly, which is an important issue for RFID middleware systems. Gonzalez et al. (2010) emphasize how massive RFID datasets can become roadblocks in supply chain management systems. They propose efficient algorithms to process RFID data and convert them into meaningful information for effective decision making. ...
Article
Full-text available
Purpose – The purpose of this paper is to investigate the impact of radio frequency identification (RFID) deployment at an airport baggage‐handling system (BHS). Design/methodology/approach – The impact of number of RFID readers at different power levels with varying conveyor (i.e. baggage‐handling conveyors) speeds on timely delivery of baggage is studied via simulation. The layout of the BHS at the Hong Kong International Airport and data pertinent to its RFID deployment in 2005 are used to build the simulation model. The RFID read logic is based on the equations defined as a function of the number of tags and the time the tags spend in the interrogation zone for each reader in order to capture possible read‐rate issues realistically. Findings – The identification capability of the BHS studied in this paper is a result of its combined ability to identify tags via RFID technology on straight and circulating conveyors, as well as at the manual recovery station for unidentified bags on circulating conveyors. Overall, timely delivery of bags to gates, as a performance metric, increases as the identification capability is improved. The controllable factors that affect the identification capability are the conveyor speed, which determines the time a tag stays in the interrogation zone; the reader antenna power level, which determines the size of the interrogation zone; and the number of reader antennas in the system that increases the likelihood of not missing tags. This paper shows that “the higher the number of reader antennas and the higher the power level on them, the better” approach is not correct. Originality/value – Unlike typical simulation studies related to RFID deployment where read‐rate issues are considered to be non‐existent, this paper captures read rate in a realistic manner in the simulation model by incorporating the effect of number of RFID tags in the interrogation zone and time that RFID tags spend in the interrogation zone due to baggage conveyor speed. Such a simulation approach can be used as a system design tool in order to investigate the impact of RFID‐specific parameters on system‐level performance.
... Mousavi et al. (2005; show a case for traceability in the meat processing industry. Houston (2001) addresses bovine traceability, where McGrann and Wiseman (2001) discuss international animal traceability and Gonzalez et al. (2010) present an approach for generic location tracking. Also a range of technologies, among which RFID (Jones, Clarke-­‐Hill, Comfort, Hillier, & Shears, 2005), has been used to tag individuals products and batches. ...
Article
Full-text available
In this article the authors present a reference model for the registration of economic data that enables the tracking and tracing of product and money flows in the registered data. The model is grounded in the Resource-Event-Agent (REA) ontology, which has its origin in accounting and provides the conceptual foundation for the International Organization for Standardization (ISO) open-edi transaction standard. The use of the reference model is illustrated with an example database that demonstrates the different usage scenarios covered by the model.
... The digital visibility provided by RFID can be computationally blinding if such massive data sets are not managed properly, which is an important issue for RFID middleware systems. Gonzalez et al. (2010) emphasise how massive RFID data sets can become roadblocks in supply chain management systems. They propose efficient algorithms to process RFID data and convert them into meaningful information for effective decision making. ...
Article
A growing number of organisations are investigating the use of Radio Frequency Identification (RFID) as a tool to improve their business processes across the enterprise. When implemented properly, the benefits RFID brings to supply-chain management, logistics, and asset tracking are clearly understood. Yet many companies do not realise the value RFID can supply by providing data input into a company's Six Sigma and Lean applications by facilitating operational visibility. In this paper, a systematic approach that integrates the phases of business process reengineering, RFID data-based decision making, laboratory-level prototyping, and pilot implementation at industry site are discussed. [Received 12 January 2010; Revised 16 August 2010; Accepted 14 December 2010]
Chapter
As the scale of the power grid continues to expand, the traditional distribution network management model cannot meet the requirements of power grid development under the new situation. The current distribution network operation inspection still lacks in data collection, and it is impossible to establish an informative and intelligent operation inspection management system. The upper-level production management system is also unable to integrate due to the lack of operation inspection marketing data. Aiming at the performance problem of the visualization of topology data in the distribution network operation and inspection, this paper uses the graph data model to build knowledge, designs graphic elements for data migration, and forms a topology map for the intelligent operation and inspection of the distribution network. The research clearly and intuitively displays the specific information of the power system equipment and the physical relationship between the equipment, thus forming a data model of the grid diagram. KeywordsGraph databaseTopological graphGraph data model for power gridDistribution network inspection
Article
We consider the scenario of multiple RFID-tagged objects that simultaneously move across an indoor space where several RFID antennas are placed. We assume that a logical partition of the indoor space into a set of locations is given, along with a set of hard and weak integrity constraints describing both the valid movements of the objects and the capacity of the locations. In this setting, we address the problem of matching the collected readings to the trajectories(namely, the sequences of locations) followed by the target objects. We model this problem as estimating a probability distribution function over the possible matchings of the readings to the locations. The core of our approach is a novel Metropolis Hastings sampler that is guided by the integrity constraints to distinguish between likely and unlikely ways of interpreting the readings. The challenges of integrating the constraints into the sampler are discussed, and a thorough experimental analysis, where the proposed approach is compared with the state of the art, is provided.
Article
Radio frequency identification (RFID) systems, as one of the key components in the Internet of Things (IoT), have attracted much attention in the domains of industry and academia. In practice, the performance of RFID systems rather relies on the effectiveness and efficiency of anti-collision algorithms. A large body of studies have recently focused on the anti-collision algorithms, such as the Q-algorithm (QA), which has been successfully utilized in EPCglobal Class-1 Generation-2 protocol. However, the performance of those anti-collision algorithms needs to be further improved. Observe that fully exploiting the pre-processing time can improve the efficiency of the QA algorithm. With an objective of improving the performance for anti-collision, we propose a Nested Q-algorithm (NQA), which makes full use of such pre-processing time and incorporates the advantages of both Binary Tree (BT) algorithm and QA algorithm. Specifically, based on the expected number of collision tags, the NQA algorithm can adaptively select either BT or QA to identify collision tags. Extensive simulation results validate the efficiency and effectiveness of our proposed NQA (i.e., less running time for processing the same number of active tags) when compared to the existing algorithms.
Article
We discuss an approach for interpreting RFID data in the context of object tracking. It consists in translating the readings generated by RFID-tracked moving objects into semantic locations over a map, by exploiting some integrity constraints. Our approach performs a probabilistic conditioning: it starts from an a-priori probability assigned to the possible trajectories, discards the trajectories that are inconsistent with the constraints, and assigns to the others a suitable probability of being the actual one.
Article
A probabilistic framework for cleaning the data collected by Radio-Frequency IDentification (RFID) tracking systems is introduced. What has to be cleaned is the set of trajectories that are the possible interpretations of the readings: a trajectory in this set is a sequence whose generic element is a location covered by the reader(s) that made the detection at the corresponding time point. The cleaning is guided by integrity constraints and consists of discarding the inconsistent trajectories and assigning to the others a suitable probability of being the actual one. The probabilities are evaluated by adopting probabilistic conditioning that logically consists of the following steps. First, the trajectories are assigned a priori probabilities that rely on the independence assumption between the time points. Then, these probabilities are revised according to the spatio-temporal correlations encoded by the constraints. This is done by conditioning the a priori probability of each trajectory to the event that the constraints are satisfied: this means taking the ratio of this a priori probability to the sum of the a priori probabilities of all the consistent trajectories. Instead of performing these steps by materializing all the trajectories and their a priori probabilities (which is infeasible, owing to the typically huge number of trajectories), our approach exploits a data structure called conditioned trajectory graph (ct-graph) that compactly represents the trajectories and their conditioned probabilities, and an algorithm for efficiently constructing the ct-graph, which progressively builds it while avoiding the construction of components encoding inconsistent trajectories.
Conference Paper
Radio Frequency Identification (RFID) is widely used to track and trace objects in supply chain management. However, massive uncertain data produced by RFID readers are not suitable for directly use in RFID applications. Following our thorough analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. In particular, we propose an adaptive cleaning method by adjusting size of smoothing window according to various rates of uncertain data, employing different strategies to process uncertain readings, and distinguishing different types of uncertain data according to their appearing positions. We propose a comprehensive data model, which is suitable for a wide range of application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequences, the positions and the time intervals. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries.
Conference Paper
In the emerging environment of the Internet of Things (IoT), through the connection of billions of radio frequency identification (RFID) tags and sensors to the Internet, applications will generate an unprecedented amount of transactions and data that requires novel approaches in RFID data stream processing and management. Unfortunately, it is difficult to maintain a distributed model without a shared directory or structured index. In this paper, we present a fully distributed model for sovereign RFID data streams. This model combines Tilted Time Frame and Histogram to represent the patterns of object flows. It is efficient in space and can be stored in main memory. The model is built on top of an unstructured P2P overlay. To reduce the overhead of distributed data acquisition, we further propose algorithms that use statistically optimistic number of network calls to maintain the model. The scalability and efficiency of the proposed model are demonstrated through an extensive set of experiments.
Chapter
Radio Frequency Identification (RFID) creates a seamless link between individual objects and their digital natives. It allows objects to be uniquely, automatically and individually identified using wireless communications. With more than a half century’s development, RFID is becoming the mainstream driving force and has been able to provide various benefits in many different industries. In this chapter, we introduce the basic concepts in RFID technologies and its applications. We discuss the characteristics of RFID data and overview the state-of-the-art research on RFID data management. We also highlight some technical challenges in the management and use of RFID data.
Article
In the supply chain using RFID technology to real-time locating and tracking items, resulting in plenty of time and space characteristics of RFID data, but also contains the implicit semantics. In order to implement efficient spatiotemporal queries and locate and track moving objects, we developed a RFID supply chain-oriented spatio-temporal data model defines the RFID data operations of spatio-temporal data, and these are given space and time data manipulation methods implemented in a relational database. Finally, operated using experimental spatial and temporal data on RFID, the operating time is analyzed to verify the validity of the data model and data manipulation.
Article
Radio Frequency Identification (RFID) technology is widely used to trace objects. However, RFID application systems can not effectively and efficiently use massive uncertain data. This study considers some properties of objects captured by sensor and GPS and proposes a comprehensive extensible model for uncertain data according to key features of RFID data which is suitable to different application scenarios. The model can effectively and efficiently store different RFID data according to key features of RFID data and supports a variety of queries for tracking and tracing RFID objects.
Conference Paper
With the widespread application of RFID technology,the efficient management of the huge amount of RFID data is very important. In the application of supply chain management, the key point is how to record and process the movement trajectories of objects. Based on a path encoding method by prime numbers, we propose an improved encoding scheme that encodes the flows of objects. It solves the problem of encoding a path with cycles, which we call “cycling path”. It can be proved that our approach is feasible and the original path can be restored by decoding. The proposed encoding scheme can be applied to more practical applications. Experiment results show that it has good performance when processing queries.
Article
An offline cleaning technique is proposed for translating the readings generated by RFID-tracked moving objects into positions over a map. It consists in a grid-based two-way filtering scheme embedding a sampling strategy for addressing missing detections. The readings are first processed in time order: at each time point t, the positions (i.e., cells of a grid assumed over the map) compatible with the reading at t are filtered according to their reachability from the positions that survived the filtering for the previous time point. Then, the positions that survived the first filtering are re-filtered, applying the same scheme in inverse order. As the two phases proceed, a probability is progressively evaluated for each candidate position at each time point t: at the end, this probability assembles the three probabilities of being the actual position given the past and future positions, and given the reading at t. A sampling procedure is employed at certain steps of the first filtering phase to intelligently reduce the number of cells to be considered as candidate positions at the next steps, as their number can grow dramatically in the presence of consecutive missing detections. The proposed approach is experimentally validated and shown to be efficient and effective in accomplishing its task.
Article
Enterprise RFID data management is highly challenging not only because of the huge volume of data from distributed sources, but particularly because of the dynamic nature of the reader inputs. With user-designated quality of service (QoS) requirements, the data management system must be able to dynamically detect the status changes of the RFID inputs and adjust the processing strategies for continuously maintaining the desired level of QoS. We propose a QoS-aware framework for modeling the enterprise data service problem, and for on-line adaptive processing of distributed RFID data streams. The data processing structure is modeled as a hierarchy of aggregation nodes in accordance with the structure of an organization. Leaf nodes correspond to the RFID streaming inputs. A set of aggregation/deaggregation operations is devised to adjust the processing granularity level based on QoS dynamics. A QoS constrained query issued at any aggregation node is parsed into an aggregation subtree rooted at that node. For QoS-aware processing of the query, several algorithms are designed to dynamically apply proper aggregation/deaggregation operations on selected nodes for raising or lowering the granularity levels or changing the aggregation methods. The goal is to continuously maintain the desired level of QoS under constant variation of the streaming data volume. Prototype development and extensive simulation show that our framework and techniques can handle highly varied RFID streaming inputs and continuously satisfy the QoS constraints.
Conference Paper
The widespread use of road sensors has generated huge amount of traffic data, which can be mined and put to various different uses. Finding frequent trajectories from the road network of a big city helps in summarizing the way the traffic behaves in the city. It can be very useful in city planning and traffic routing mechanisms, and may be used to suggest the best routes given the region, road, time of day, day of week, season, weather, and events etc. Other than the frequent patterns, even the events that are not so frequent, such as those observed when there is heavy snowfall, other extreme weather conditions, long traffic jams, accidents, etc. might actually follow a periodic occurrence, and hence might be useful to mine. This problem of mining the frequent patterns from road traffic data has been addressed in previous works using the context knowledge of the road network of the city. In this paper, we have developed a method to mine spatiotemporal periodic patterns in the traffic data and use these periodic behaviors to summarize the huge road network. The first step is to find periodic patterns from the speed data of individual road sensor stations, and use their periods to represent the station's periodic behavior using probability distribution matrices. Then, we use density-based clustering to cluster the sensors on the road network based on the similarities between their periodic behavior as well as their geographical distance, thus combining similar nodes to form a road network with larger but fewer nodes.
Conference Paper
Owing to the dynamic nature of the web, it is difficult for search engine to find the relevant documents to serve a user query. For this purpose, search engine maintains the index of downloaded documents stored in the local repository. Whenever a query comes search engine searches the index in order to find the relevant matched results to be presented to the user. The quality of the matched result depends on the information stored in the index. The more efficient is the structure of index, more efficient the performance of search engine. Generally, inverted index are based solely on the frequency of keywords present in number of documents. In order to improve the efficiency of the search engine, an improved indexing mechanism to index the web documents is being proposed that keeps the context related information integrated with the frequency of the keyword. The structure is implemented using Trie. The implementation results on various documents show that proposed index efficiently stores the documents and search is fast.
Conference Paper
Radio Frequency Identification (RFID) technology is widely used in object tracking and tracing. However, massive uncertain data produced by RFID readers is not effective to use in RFID application systems. According to the position and the containment features of RFID objects, this paper proposes a new temporal-based uncertain data model, which can mainly store massive missed RFID data. In addition, the model stores certain data and inferred data in the same table, and this can efficiently support a variety of queries for tracking and tracing RFID objects.
Conference Paper
Full-text available
Abstract—RFID technology generates massive amount of path oriented data. A lot of research work is going on to model the RFID data generated from Supply Chain management. In this paper we have considered RFID data generated from RFID tagged passports. This temporal data can be queried using map reduce method from the data sent from different countries to the origin country. The frequent information about migrants can be organized in a better way to reduce the execution complexity. As compared with supply chain management long paths are common in RFID tagged passports. The techniques which we have presented in this paper will be very useful to model the massive amount of RFID data.
Conference Paper
Radio Frequency Identification (RFID) is an emerging technology which brings enormous productivity benefits in applications where objects have to be identified automatically. RFID data, however, have their own unique characteristics - including aggregation, location, temporal and history-oriented - which have to be fully considered and integrated into the RFID data model which is constructed for RFID application system. In this paper, we use an expressive temporal data model for RFID data. This data model is based on Entity-Relation (ER) model with minimum extension and highlights the state history and temporal semantics of the RFID system business processes. We also show common RFID data tracking and monitoring types, and propose methods to express such queries based on this data model.
Article
Recent advances in sensor networks and communication technologies have made the Internet of Things (IoT) a hot research issue. An IoT system can sample and manage the historical and present states of various kinds of physical and virtual objects such as vehicles, lakes, mountains, dams, city traffic conditions, atmosphere qualities, and so forth. It is well acknowledged that IoT will greatly change the way how people live and work. However, IoT also brings about great challenges to the data management community. For instance, the data to be managed in IoT are highly dynamic and heterogeneous. Meanwhile, since the sensor sampling data are managed in a centralized manner, the data size can be huge. Moreover, sensor data are intrinsically spatial-temporal data which may involve complicated spatial-temporal computations in query processing. To meet these challenges, we propose a novel Sea-Cloud-based Data Management (SeaCloudDM) mechanism in this paper. The experimental results show that the SeaCloudDM mechanism provides satisfactory performances in managing and querying massive sensor sampling data, and is thus a viable solution for IoT data management.
Article
RFID technology provides a powerful ability of perceiving the world for human and it produces vast amounts of data. how to store and analyze the mass information has become a new challenge. A novel data structure for management and storage of RFID data is proposed in this paper, which uses a improvement form of the T tree - T list of trees (Dual T tree) and a path encoding technique to build spatio-temporal memory structure efficiently. Based on the memory structure, this paper introduces the event processing and query analysis algorithms, called DTTSTQ, and analyzes the time complexity of the algorithm. Experiments on real and synthetical data prove the validity and correctness of the proposed structure and algorithms.
Thesis
Full-text available
In de praktijk zijn bedrijfsinformatiesystemen—zoals een boekhoudsysteem, een voorraadbeheersysteem, een kostprijscalculatiesysteem, een ordermanagementsysteem — onderdeel van een veel grotere informatieverwerkende omgeving. Bijgevolg is het volgens Moody en Shanks onnuttig om bedrijfsinformatiesystemen los van elkaar te bekijken. In [MS03] stellen en tonen ze aan dat het beschouwen van informatiesystemen binnen deze bredere context cruciaal is voor het verkrijgen van kwaliteitsvolle informatiesystemen. Daarom voegen zijn ‘integratie’toe aan een lijst van reeds bestaande kwaliteitsfactoren voor het ontwerpen van informatiesystemen. Met integratie bedoelen ze de mate waarin de onderliggende infrastructuurmodellen van verschillende bedrijfsinformatiesystemen met elkaar overeenstemmen, wat de samenwerking tussen en het uitwisselen en hergebruik van data door deze systemen moet vergemakkelijken. Bij het ontwerpen van een informatiesysteem starten we met een conceptueel ontwerp. Zo’n conceptueel ontwerp stelt het beoogde informatiesysteem voor zonder details die specifiek zijn voor de gekozen technologie — zoals een webpagina, een database, software — te tonen. Een conceptueel model focust dus op het voorstellen van een conceptuele oplossing voor een bestaand probleem nog voor de technologische oplossing wordt ontworpen. Hoewel zo’n conceptueel model slechts een deel van de uiteindelijke oplossing voorstelt, stelt het de essentie van de oplossing voor—die dan eventueel in verschillende technologie¨en tegelijk kan worden uitgewerkt. Omdat deze oplossing essentieel is, moet een conceptueel model aan een reeks kwaliteitsparameters voldoen. Zo moet een conceptueel model een zo eenvoudig mogelijke maar toch volledige oplossing bieden voor een probleem. Ook moet de oplossing, voorgesteld in het conceptueel model, verstaanbaar zijn voor anderen dan de ontwerpers—bijvoorbeeld de programmeurs die de praktijkoplossing uitwerken of de klant die het systeem besteld heeft en zal gebruiken. Vaak probeert men ook te bereiken dat een systeem flexibel is, waarmee men bedoelt dat de oplossing rekening houdt met mogelijke veranderingen in de omgeving — zonder in te gaan tegen wat als gangbaar wordt beschouwd binnen de bredere informatieverwerkende omgeving of zonder moeilijk te verwezenlijken te zijn binnen het vooropgestelde budget en de beoogde planning. Zonder deze kwaliteitsfactoren uit het oog te verliezen, focussen we in deze doctoraatsthesis op de integratie van informatiesysteemontwerpen, die door Moody en Shanks werd voorgesteld en waarvan zij substanti¨ele voordelen aantoonden. Aangezien er vele verschillende soorten conceptuele modellen bestaan, kiezen wij er in deze thesis twee specifieke soorten uit, namelijk conceptuele datamodellen en simulatiemodellen. We kiezen datamodellen omdat zij een van de meest essenti¨ele soorten conceptuele modellen voorstellen. Conceptuele datamodellen stellen immers de basis van elke informatiesysteem voor. Bijgevolg is de keuze voor of het ontwerp van een datamodel een van de meest cruciale stappen in het ontwerp van een informatiesysteem die invloed heeft op o.a. de uiteindelijke kost van het project, de flexibiliteit van het resulterende systeem, de klantentevredenheid en de integratie met andere (bestaande) informatiesystemen. Waar conceptuele datamodellen ons toelaten een ontwerp te maken voor de data die in informatiesystemen worden bijgehouden over de huidige toestand en historiek van een voor ons relevant deel van de realiteit, laten simulatiemodellen ons toe een ontwerp — dat met potenti¨ele toekomstige situaties kan omgaan — te formuleren van een voor ons relevant deel van een omgeving. Samen laten conceptuele data- en simulatiemodellen ons dus toe het verleden, heden en de toekomst van een voor ons relevant deel van de realiteit voor te stellen. In deze thesis is dat relevant deel van de realiteit de bedrijfseconomische context. Aangezien deze bedrijfseconomische context dynamisch — en dus inherent onstabiel — is met steeds wijzigende en slechts gedeeltelijk gespecificeerde systeemvereisten — wat flexibiliteit vereist — is een speciale aanpak noodzakelijk. Deze aanpak heet ‘Design Science’ [HMJR04] en houdt ook rekening met hoe succesvolle informatiesysteemontwerpen afhankelijk zijn van de cognitieve en sociale vaardigheden van de mensen die deze systemen gebruiken, ontwikkelen en ontwerpen. Om deze flexibiliteit en sociale en cognitieve vaardigheden te ondersteunen, gebruiken we een ontologie als basis voor het ontwerpen van de conceptuele dataen simulatiemodellen in deze doctoraatsthesis. Zo’n ontologie is een gedeelde beschrijving van het probleemdomein (bijvoorbeeld de bedrijfseconomische realiteit). Dat deze beschrijving gedeeld wordt, ondersteunt de sociale en cognitieve vaardigheden van de groep mensen die een informatiesysteem ontwerpt, omdat de beschrijving kan gebruikt worden als een referentiekader voor deze groep mensen. Daarenboven kan dit delen ervoor zorgen dat de beschrijving hergebruikt en verbeterd wordt bij opeenvolgende projecten door verschillende groepen. Dit hergebruik van de beschrijving van het probleemdomein ondersteunt dan impliciet de integratie tussen verschillende projecten, aangezien ze gebaseerd zijn op (verschillende delen van) dezelfde probleembeschrijving. De ontologie die wij in deze thesis hoofdzakelijk hanteren is de Resource- Event-Agent (REA) ontologie. Deze ontologie werd begin jaren ’80 ontwikkeld voor boekhoudinformatiesystemen, met het oog op een gedeelde dataomgeving waarin boekhouders en niet-boekhouders informatie delen over dezelfde economische gebeurtenissen — zoals aankopen, verkopen, productie. De ‘Resources’ beschrijven de producten (i.e. goederen en diensten) die verhandeld en geproduceerd worden. De ‘Events’ beschrijven de gebeurtenissen die de voorraden van deze producten veranderen. Bijvoorbeeld, een verkoop van producten vermindert de voorraad van de verkoper en vermeerdert de voorraad van de aankoper. De ‘Agents’ beschrijven de economische actoren die de goederen en diensten produceren, verkopen en aankopen. Ondertussen werd de REA-ontologie al gebruikt als basis voor boekhoudinformatiesystemen en een ISO open-edi standaard [ISO07] voor het uitwisselen van elektronische bedrijfsdocumenten, als methode voor het onderwijzen van boekhoudinformatiesystemen en nog veel meer. [GLP08] In het eerste hoofdstuk van datamodelsectie gebruiken we de REA- en UFOontologie om conceptuele datamodellen te structureren en af te bakenen zodat ze makkelijker te interpreteren zijn, voornamelijk door onervaren systeemontwerpers. De UFO (i.e. Unified Foundational Ontology) is een ontologie die speciaal werd ontworpen om conceptuele modellen te duiden. [BGHS+05] Van de gestructureerde conceptuele datamodellen wordt verwacht dat ze onervaren ontwerpers helpen conceptuele datamodellen te maken die compleet zijn en geen overbodige onderdelen bevatten. Onvolledige modellen en overbodige onderdelen zijn immers de meest voorkomende fouten die onervaren ontwerpers maken. De gestructureerde datamodellen die aan de onervaren ontwerpers worden aangeboden zijn patronen waarvan al bewezen is dat ze waardevolle oplossingen zijn voor bepaalde problemen binnen een specifieke context. Door deze patronen te structureren volgens de REA- en UFO-ontologie wordt de ontwerper verwacht sneller de overbodige en ontbrekende delen van een patroon te kunnen identificeren, naargelang van het probleem dat hij wenst op te lossen. Door de aangeboden structuur kan de ontwerper ook op zoek naar ontbrekende delen van zijn oplossing in andere patronen, die hij dan kan integreren in zijn bestaande onvolledige oplossing. Dit integreren van patronen heeft twee voordelen. Ten eerste wordt de integratie tussen systemen die (delen) van dezelfde patronen bevatten vergemakkelijkt, ten tweede wordt de kwaliteit van de modellen die werden gemaakt door onervaren informatiesysteemontwerpers verhoogd. In het tweede hoofdstuk van de datamodelsectie gebruiken we de REA-ontologie als basis voor het ontwikkelen van een conceptueel referentiedatamodel dat geschikt is om zowel productie- als transactiedata van verschillende handelspartners voor te stellen. Dat het datamodel data van verschillende handelspartners tegelijkertijd kan voorstellen, heeft als gevolg dat de integratie tussen de bedrijfssystemen van deze handelspartners sterk vergemakkelijkt wordt. Zowel het integreren van de verschillende informatiesystemen van elke individuele handelspartner (bijvoorbeeld verkoopsysteem en voorraadbeheer) als het integreren van informatiesystemen van verschillende handelspartners (bijvoorbeeld een aankoopsysteem met een verkoopsysteem) worden vergemakkelijkt. Het voorstellen van productieen transactiedata van verschillende handelspartners wordt verwezenlijkt door zowel het perspectief van de individuele handelspartners als het perspectief van een onafhankelijke derde partij expliciet in het datamodel te integreren. Door deze keuze is het datamodel geschikt om volledige waardesystemen voor te stellen. Zo’n waardesysteem bestaat uit de transacties tussen handelspartners in een waardenetwerk of toeleveringsketen (supply chain) en de bedrijfsprocessen die elk van deze handelspartners uitvoert. In het derde en laatste hoofdstuk van de datamodelsectie gebruiken we het conceptuele datamodel dat werd voorgesteld in het tweede hoofdstuk om een toepassing te ontwikkelen die het mogelijk maakt historische, huidige en toekomstige product- en geldstromen te volgen in een waardesysteem. Het gekozen voorbeeld toont hoe akkerbouwgewassen worden gebruikt voor humane consumptie en veevoeder. De dierlijke producten worden dan weer gebruikt voor humane consumptie en samen met de landbouwgewassen verwerkt in consumentenproducten. De excrementen vloeien terug naar de akkerbouw. Het beschreven waardesysteem toont hoe goederen- en geldstromen kunnen worden gevolgd bij transacties tussen handelspartners (bijvoorbeeld akkerbouwer en voedermolen) en hoe goederenstromen kunnen worden gevolgd doorheen de productieprocessen van deze handelspartners (bijvoorbeeld het verwerken van akkerbouwgewassen van verschillende herkomst in een lading veevoer). Bovendien toont de ontwikkelde toepassing dat diezelfde data kunnen worden gebruikt om de transacties tussen handelspartners weer te geven (bijvoorbeeld een akkerbouwer die graan verkoopt aan een voedermolen en daarvoor een vergoeding ontvangt). Diezelfde informatie kan dan ook gebruikt worden om geldstromen in kaart te brengen zoals gebeurt met de goederenstromen. Daarenboven wordt gedemonstreerd hoe niet alleen huidige en gewezen, maar ook hoe toekomstige transacties en goederen- en geldstromen in kaart kunnen worden gebracht (o.a. aan de hand van contracten en productieschema’s). In de levensmiddelenindustrie hebben dergelijke systemen voor het traceren van huidige en gewezen goederenstromen al hun nut bewezen wanneer gecontamineerde voedingstoffen werden aangetroffen of getransporteerd vee ziek bleek. Aan de mogelijkheden van dergelijke traceringsystemen voegen wij toe dat niet enkel de bron van de besmetting, maar ook het doel van de goederen en de geldstromen, kan worden ge¨ıdentificeerd. Hierdoor kunnen we de economische gevolgen van zo’n besmetting inschatten aan de hand van contracten en productieschema’s. Met de beschikbare informatie kunnen dan eventueel noodscenario’s worden uitgewerkt om de economische gevolgen van zo’n besmetting te beperken. Ook in andere sectoren kan zo’n traceringssysteem nuttig zijn. Zo kan het in kaart brengen van een volledig waardenetwerk voorkomen dat namaakgoederen in de reguliere handel terechtkomen, of dat geldstromen uit de reguliere economie worden gebruikt om illegale activiteiten wit te wassen of te financieren. Waar het tweede en derde hoofdstuk van de datamodelsectie REA-gebaseerde datamodellen voor de integratie van de informatiesystemen van handelspartners voorstellen, bevat de simulatiemodelsectie REA-gebaseerde simulatiemodelelementen die het toelaten bedrijfsprocesmodellen over bedrijfsgrenzen heen met elkaar te integreren zodat ook simulatiemodellen voor volledige waardesystemen kunnen worden ontwikkeld. De hoofdstukken in de datamodelsectie bevatten dus geen bedrijfsprocesmodellen, maar superstructuren die het mogelijk maken bedrijfsprocesmodellen te hergebruiken en te integreren over de grenzen van de onderneming heen. Daarenboven werden deze superstructuren zo ontworpen dat ze ook als zelfstandige elementen van simulatiemodellen voor transacties tussen handelspartners kunnen worden gebruikt. Dit betekent dan wel dat men abstractie maakt van de bedrijfsprocessen die deze transacties ondersteunen. Het eerste hoofdstuk van de simulatiemodelsectie analyseert de mogelijke configuraties (bijvoorbeeld eerst betalen dan halen, eerst halen dan betalen) voor transacties tussen handelspartners en de manier waarop deze configuraties de interne structuur van een bedrijf be¨ınvloeden. In deze analyse wordt er vooral gekeken naar het vermogen van een bedrijf om zijn activiteiten te financieren met het krediet dat handelspartners verlenen (bijvoorbeeld de betalingstermijn op facturen, voorschotten). De structuren in dit hoofdstuk worden voorgesteld als Petri-net gebaseerde workflowmodellen. Deze workflowmodellen laten toe te evalueren of een gegeven sequentie van transacties externe financiering (bijvoorbeeld een banklening) vereist of niet. Door de verschillende configuraties voor een sequentie van transacties te evalueren kan men de meest optimale (bijvoorbeeld diegene die zo min mogelijk externe financiering vereist) selecteren. Het tweede hoofdstuk van de simulatiemodelsectie bouwt voort op de modellen in het voorgaande hoofdstuk om ook statistische analyses van deze verschillende configuraties mogelijk te maken. In deze analyses hoeft men dus niet meer te vertrekken van een gegeven sequentie van transacties, maar kan men ook onzekerheid en variatie in rekening brengen. Door de REA-elementen toe te voegen aan simulatiemodelelementen, worden de capaciteiten van de huidige generatie statistische simulatiemodellen uitgebreid van het analyseren van logistieke processen naar het analyseren van volledige bedrijfsmodellen met inbegrip van de financi¨ele parameters en resultaten. In tegenstelling tot de workflowmodellen in het eerste hoofdstuk van de simulatiemodelsectie, hebben de statistische simulatiemodellen een gelaagde opbouw. De bovenste laag modelleert de transacties tussen bedrijven, de middelste laag modelleert de interne financi¨ele structuur van bedrijven en de onderste laag bevat bedrijfsprocesmodellen voor individuele bedrijfsprocessen. De gelaagde opbouw laat ons toe zowel de individuele lagen te simuleren als bedrijfsprocesmodellen in interne structuurmodellen te integreren en deze dan weer in transactiemodellen te integreren. In deze thesis ontwikkelen we dus een benadering voor het ontwikkelen en integreren van modellen binnen een bedrijfseconomische context. Deze benadering houdt in dat we conceptuele modellen ontwikkelen vanuit en toetsen aan een bedrijfseconomische ontologie. De bedrijfseconomische ontologie die we in deze thesis gebruiken is de REA-ontologie. Deze thesis toont aan dat de REA-ontologie geschikt is als basis voor het ontwikkelen en integreren van zowel conceptuele datamodellen als simulatiemodellen binnen een bedrijfseconomische context. Door het gebruik van deze benadering zouden we de bedrijfswereld niet enkel moeten kunnen uitrusten met robuustere datamodellen voor de bedrijfsinformatiesystemen die ze dagdagelijks gebruiken, maar ook met krachtigere beslissingsondersteunende hulpmiddelen die hen voorzien van de informatie die ze nodig hebben voor het evalueren en voorspellen van de prestaties van hun bedrijf. Die informatie kan dan zowel van financi¨ele (bijvoorbeeld winstmarges) als operationele (bijvoorbeeld wachttijden) aard zijn. Het evalueren kan gebeuren op basis van de data die zijn opgeslagen in de bedrijfsinformatiesystemen en het voorspellen kan gebeuren door het genereren van hypothetische data in simulatiemodellen voor toekomstige bedrijfsprocessen en omgevingsomstandigheden.
Conference Paper
Full-text available
We present PEEX, a system that enables applications to define and extract meaningful probabilistic high-level events from RFID data. PEEX effectively copes with errors in the data and the inherent ambiguity of event extraction.
Article
Full-text available
RFID (radio frequency identification) has received a great deal of attention in the commercial world over the past couple of years. The excitement stems from a confluence of events. First, through the efforts of the former Auto-ID Center and its sponsor companies, the prospects of low-cost RFID tags and a networked supply chain have come within reach of a number of companies. Second, several commercial companies and government bodies, such as Wal-Mart and Target in the United States, Tesco in Europe, and the U.S. Department of Defense, have announced RFID initiatives in response to technology improvements.
Article
Full-text available
A Family of new measures of point and graph centrality based on early intuitions of Bavelas (1948) is introduced. These measures define centrality in terms of the degree to which a point falls on the shortest path between others and therefore has a potential for control of communication. They may be used to index centrality in any large or small network of symmetrical relations, whether connected or unconnected.
Conference Paper
Full-text available
RFID technology provides significant advantages over traditional object-tracking technologies and is increasingly adopted and deployed in real applications. RFID applications generate large volume of streaming data, which have to be automatically filtered, processed, and transformed into semantic data, and integrated into business applications. Indeed, RFID data are highly temporal, and RFID observations form complex temporal event patterns which can be very different for various RFID applications. Thus, it is desirable to have a general RFID data processing framework with a powerful language, for the end users to express a variety of queries on RFID data streams, as well as detecting complex events patterns. While data stream management systems (DSMSs) are emerging for optimized stream data processing, they usually lack the language construct support for temporal event detection. In this paper, we discuss a stream query language to provide comprehensive temporal event detection, through temporal operators and extension of sliding-window constructs. With the integration of temporal event detection, a DSMS has the capability to serve as a powerful system for RFID data processing.
Conference Paper
Full-text available
Radio Frequency Identification (RFID) has recently received a lot of attention as an augmentation technology in the ubiquitous computing domain. In this paper we present various sources of error in passive RFID systems, which can make the reliable operation of RFID augmented applications a challenge. To illustrate these sources of error, we equipped playing cards with RFID tags and measured the performance of the RFID system during the different stages of a typical card game. The paper also shows how appropriate system design can help to deal with the imperfections associated with RFID.
Article
Full-text available
Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications. Recently, Gray et al. [GBLP95] proposed the "Cube" operator, which computes group-by aggregations over all possible subsets of the specified dimensions. The rapid acceptance of the importance of this operator has led to a variant of the Cube being proposed for the SQL standard. Several efficient algorithms for Relational OLAP (ROLAP) have been developed to compute the Cube. However, to our knowledge there is nothing in the literature on how to compute the Cube for Multidimensional OLAP (MOLAP) systems, which store their data in sparse arrays rather than in tables. In this paper, we present a MOLAP algorithm to compute the Cube, and compare it to a leading ROLAP algorithm. The comparison between the two is interesting, since although they are computing the same function, one is value-based (the ROLAP algorithm) whereas the other is position-based (the M...
Article
Full-text available
At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of groupbys. We focus on a special case of the aggregation problem --- computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several optimizations, like combining common operations across multiple group-bys, caching, and using pre-computed group-bys for computing other group-bys. Empirical evaluation shows that the resulting algorithms give m...
Article
Full-text available
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developedtwo major approaches, top-down vs. bottomup. The former, represented by the MultiWay Array Cube (called MultiWay) algorithm[25], aggregates simultaneously on multipledimensions
Conference Paper
Full-text available
The Auto-ID Center is developing low-cost radio frequency identification (RFID) based systems with the initial application as next generation bar-codes. We describe RFID technology, summarize our approach and our research, and most importantly, describe the research opportunities in RFID for experts in cryptography and information security. The common theme in low-cost RFID systems is that computation resources are very limited, and all aspects of the RFID system are connected to each other. Understanding these connections and the resulting design trade-offs is an important prerequisite to effectively answering the challenges of security and privacy in low-cost RFID systems.
Chapter
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottom-up. For efficient cube computation in various data distributions, this chapter proposes an interesting cube computation method, Star-Cubing, that integrates the strength of both top-down and bottom-up cube computation, and explores a few additional optimization techniques. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-by's that do not satisfy the iceberg condition. The performance study shows that Star-Cubing is highly efficient and outperforms all the previous methods in almost all kinds of data distributions. Two optimization techniques are emphasized: (1) shared aggregation by taking advantage of shared dimensions among the current cuboid and its descendant cuboids; and (2) prune as soon as possible the unpromising cells during the cube computation using the anti-monotonic property of the iceberg cube measure. No previous cubing method has fully explored both optimization methods in one algorithm. Moreover, a new compressed data structure, star-tree, is proposed using star nodes. In addition, a few other optimization techniques contribute to the high performance of the method.
Article
We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs. We present a thorough performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve Iceberg-CUBE computation.
Conference Paper
Radio Frequency Identiflcation (RFID) technology is fast becoming a prevalent tool in tracking commodities in sup- ply chain management applications. The movement of com- modities through the supply chain forms a gigantic work∞ow that can be mined for the discovery of trends, ∞ow corre- lations and outlier paths, that in turn can be valuable in understanding and optimizing business processes. In this paper, we propose a method to construct com- pressed probabilistic work∞ows that capture the movement trends and signiflcant exceptions of the overall data sets, but with a size that is substantially smaller than that of the complete RFID work∞ow. Compression is achieved based on the following observations: (1) only a relatively small minority of items deviate from the general trend, (2) only truly non-redundant deviations, i.e., those that substantially deviate from the previously recorded ones, are interesting, and (3) although RFID data is registered at the primitive level, data analysis usually takes place at a higher abstrac- tion level. Techniques for work∞ow compression based on non-redundant transition and emission probabilities are de- rived; and an algorithm for computing approximate path probabilities is developed. Our experiments demonstrate the utility and feasibility of our design, data structure, and algorithms.
Conference Paper
With the advent of RFID (Radio Frequency Identiflcation) technology, manufacturers, distributors, and retailers will be able to track the movement of individual objects throughout the supply chain. The volume of data generated by a typical RFID application will be enormous as each item will gen- erate a complete history of all the individual locations that it occupied at every point in time, possibly from a speciflc production line at a given factory, passing through multiple warehouses, and all the way to a particular checkout counter in a store. The movement trails of such RFID data form gi- gantic commodity ∞owgraph representing the locations and durations of the path stages traversed by each item. This commodity ∞ow contains rich multi-dimensional information on the characteristics, trends, changes and outliers of com- modity movements. In this paper, we propose a method to construct a warehouse of commodity ∞ows, called ∞owcube. As in standard OLAP, the model will be composed of cuboids that aggregate item ∞ows at a given abstraction level. The ∞owcube difiers from the traditional data cube in two major ways. First, the measure of each cell will not be a scalar aggregate but a commodity ∞owgraph that captures the major movement trends and signiflcant deviations of the items aggregated in the cell. Second, each ∞owgraph itself can be viewed at multiple levels by changing the level of abstraction of path stages. In this paper, we motivate the importance of the model, and present an e-cient method to compute it by (1) performing simultaneous aggregation of paths to all interesting abstraction levels, (2) pruning low support path segments along the item and path stage abstraction lattices, and (3) compressing the cube by removing rarely occurring cells, and cells whose commodity ∞ows can be inferred from higher level cells.
Conference Paper
Radio Frequency Identification is gaining broader adoption in many areas. One of the challenges in implementing an RFID- based system is dealing with anomalies in RFID reads. A small number of anomalies can translate into large errors in analytical results. Conventional "eager" approaches cleanse all data upfront and then apply queries on cleaned data. However, this approach is not feasible when several applications define anomalies and corrections on the same data set differently and not all anomalies can be defined beforehand. This necessitates anomaly handling at query time. We introduce a deferred approach for detecting and correcting RFID data anomalies. Each application specifies the detection and the correction of relevant anomalies using declarative sequence-based rules. An application query is then automatically rewritten based on the cleansing rules that the application has specified, to provide answers over cleaned data. We show that a naive approach to deferred cleansing that applies rules without leveraging query information can be prohibitive. We develop two novel rewrite methods, both of which reduce the amount of data to be cleaned, by exploiting predicates in application queries while guaranteeing correct answers. We leverage standardized SQL/OLAP functionality to implement rules specified in a declarative sequence-based language. This allows efficient evaluation of cleansing rules using existing query processing capabilities of a DBMS. Our experimental results show that deferred cleansing is affordable for typical analytic queries over RFID data.
Conference Paper
To compensate for the inherent unreliability of RFID data streams, most RFID middleware systems employ a "smoothing filter", a sliding-window aggregate that interpolates for lost readings. In this paper, we propose SMURF, the first declarative, adaptive smooth-ing filter for RFID data cleaning. SMURF models the unreliability of RFID readings by viewing RFID streams as a statistical sample of tags in the physical world, and exploits techniques grounded in sampling theory to drive its cleaning processes. Through the use of tools such as binomial sampling and p-estimators, SMURF contin-uously adapts the smoothing window size in a principled manner to provide accurate RFID data to applications.
Conference Paper
As the size of an RFID tag becomes smaller and the price of the tag gets lower, RFID technology has been applied to a wide range of areas. Recently, RFID has been adopted in the business area such as supply chain management. Since companies can get movement information for products easily using the RFID technology, it is expected to revolutionize supply chain management. However, the amount of RFID data in supply chain management is huge. Therefore, it re- quires much time to extract valuable information from RFID data for supply chain management. In this paper, we define query templates for tracking queries and path oriented queries to analyze the supply chain. We then propose an eective path encoding scheme to encode the flow information for products. To retrieve the time in- formation for products eciently, we utilize a numbering scheme used in the XML area. Based on the path encod- ing scheme and the numbering scheme, we devise a storage scheme to process tracking queries and path oriented queries eciently. Finally, we propose a method which translates the queries to SQL queries. Experimental results show that our approach can process the queries eciently. On the av- erage, our approach is about 680 times better than a recent technique in terms of query performance.
Conference Paper
In this paper, we present the design, implementation, and evalua- tion of a system that executes complex event queries over real-time streams of RFID readings encoded as events. These complex event queries filter and correlate events to match specific patterns, and transform the relevant events into new composite events for the use of external monitoring applications. Stream-based execution of these queries enables time-critical actions to be taken in environ- ments such as supply chain management, surveillance and facility management, healthcare, etc. We first propose a complex event language that significantly extends existing event languages to meet the needs of a range of RFID-enabled monitoring applica- tions. We then describe a query plan-based approach to efficiently implementing this language. Our approach uses native operators to efficiently handle query-defined sequences, which are a key com- ponent of complex event processing, and pipelines such sequences to subsequent operators that are built by leveraging relational tech- niques. We also develop a large suite of optimization techniques to address challenges such as large sliding windows and intermediate result sizes. We demonstrate the effectiveness of our approach through a detailed performance analysis of our prototype imple- mentation as well as through a comparison to a state-of-the-art stream processor.
Conference Paper
Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an “infinite value”: ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation
Conference Paper
One research question crucial to RFID technology's wider adoption is how to efficiently transform sequences of RFID readings into meaningful business events. Contrary to traditional events, RFID readings are usually of high volume and velocity, and have the attributes representing their reading objects, occurrence times and spots. Based on these characteristics and the non-deterministic finite automata (NFA) implementation framework, this paper studies the performance issues of RFID complex event processing and proposes corresponding optimization techniques. Our techniques include : (1) taking advantage of negation events or exclusiveness between events to prune intermediate results, thus reduce memory consumption; (2) with complex events' different selectivities, purposefully reordering the join operations between events to improve overall efficiency, thus achieve higher stream throughput; (3) utilizing the slot-based or B+-tree-based approach to optimize the processing performance with the time window constraint. We present these techniques' analytical results and validate their effectiveness through experiments.
Article
Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.
Conference Paper
Efficient and accurate data cleaning is an essential task for the successful deployment of RFID systems. Although important advances have been made in tag detection rates, it is still common to see a large number of lost readings due to radio frequency (RF) interference and tag-reader configurations. Existing cleaning techniques have focused on the development of accurate methods that work well under a wide set of conditions, but have disregarded the very high cost of cleaning in a real application that may have thousands of readers and millions of tags. In this paper, we propose a cleaning framework that takes an RFID data set and a collection of cleaning methods, with associated costs, and induces a cleaning plan that optimizes the overall accuracy-adjusted cleaning costs by determining the conditions under which inexpensive methods are appropriate, and those under which more expensive methods are absolutely necessary.
Conference Paper
Data captured from the physical world through sensor devices tends to be noisy and unreliable. The data cleaning process for such data is not easily handled by standard data warehouse-oriented techniques, which do not take into account the strong temporal and spatial components of receptor data. We present Extensible receptor Stream Processing (ESP), a declarative query-based framework designed to clean the data streams produced by sensor devices.
Conference Paper
Radio Frequency Identification (RFID) applications are set to play an essential role in object tracking and supply chain management systems. In the near future, it is expected that every major retailer will use RFID systems to track the movement of products from suppliers to warehouses, store backrooms and eventually to points of sale. The volume of information generated by such systems can be enormous as each individual item (a pallet, a case, or an SKU) will leave a trail of data as it moves through different locations. As a departure from the traditional data cube, we propose a new warehousing model that preserves object transitions while providing significant compression and path-dependent aggregates, based on the following observations: (1) items usually move together in large groups through early stages in the system (e.g., distribution centers) and only in later stages (e.g., stores) do they move in smaller groups, and (2) although RFID data is registered at the primitive level, data analysis usually takes place at a higher abstraction level. Techniques for summarizing and indexing data, and methods for processing a variety of queries based on this framework are developed in this study. Our experiments demonstrate the utility and feasibility of our design, data structure, and algorithms.
Article
Motivated by the fast‐growing need to compute centrality indices on large, yet very sparse, networks, new algorithms for betweenness are introduced in this paper. They require O(n + m) space and run in O(nm) and O(nm + n² log n) time on unweighted and weighted networks, respectively, where m is the number of links. Experimental evidence is provided that this substantially increases the range of networks for which centrality analysis is feasible. The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require ?(n ³) time and ?(n ²) space, where n is the number of actors in the network.
Article
NIST has considered the performance of AES candidates on smart-cards as an important selection criterion and many submitters have highlighted the compactness and efficiency of their submission on low end smart cards. However, in light of recently discovered power based attacks, we strongly argue that evaluating smart-card suitability of AES candidates requires a very cautious approach. We demonstrate that straightforward implementations of AES candidates on smart cards, are highly vulnerable to power analysis and readily leak away all secret keys. To illustrate our point, we describe a power based attack on the Twofish Reference 6805 code which we implemented on a ST16 smart card. The attack required power samples from only 100 independent block encryptions to fully recover the 128-bit secret key. We also describe how all other AES candidates are susceptible to similar attacks. We review the basis of power attacks and suggest countermeasures for a secure implementation. Unfortun...
Article
: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value", ALL. For example, the point (ALL,ALL,ALL,...,ALL, sum(*)) would represent the global sum of all items. Each ALL value actually represents the set of...
Article
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We then present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant...
Article
One often hears the claim that smart cards are the solution to a number of security problems, including those arising in point-of-sale systems. This paper argues that many proposed smart card systems still lack effective security for point-of-sale applications. We consider the point-of-sale terminal as a potentially hostile environment to the smart card. Moreover, we discuss several types of modifications that can be made to smart cards to improve their security and address this problem. We prove a set of equivalences among a number of these modifications: ffl private input = private output ffl trusted input + one-bit trusted output = trusted output + one-bit trusted input ffl secure input = secure output This research was supported in part by the Advanced Research Projects Agency under contract F119628-93-C-0193, IBM, U.S. Department of Energy under Contract No. W-7405-ENG-36 and the US Postal Service. Howard Gobioff was supported in part by a National Science Foundation Graduate ...
Article
One often hears the claim that smart cards are the solution to a number of security problems, including those arising in point-of-sale systems. In this paper, we characterize the minimal properties necessary for the secure smart card point-of-sale transactions. Many proposed systems fail to provide these properties: problems arise from failures to provide secure communication channels between the user and the smart card while operating in a potentially hostile environment (such as a point-of-sale application.) Moreover, we discuss several types of modifications that can be made to give smart cards additional input /output capacity with a user, and describe how this additional I/O can address the hostile environment problem. We give a notation for describing the effectiveness of smart cards under various environmental assumptions. We discuss several security equivalences among different scenarios for smart cards in hostile environments. Using our notation, these equivalences include: f...
Flowcube: Constructuing RFID Peex: Extracting
  • S Sarma
  • Integrating
S. Sarma. Integrating RFID. ACM Queue, 2(7):50–57, October 2004. Flowcube: Constructuing RFID Peex: Extracting In Proc. 2008 Int. Conf. Data In Proc. 2008 ACM-Standard, EPCglobal, Standard, EPCglobal, http:
Implementing data cubes efficiently [21] 13.56 mhz ism band class 1 radio frequency identification tag interface specification
  • V Harinarayan
  • A Rajaraman
  • J D Ullman
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'96), pages 205–216, Montreal, Canada, June 1996. [21] 13.56 mhz ism band class 1 radio frequency identification tag interface specification. Technical report, MIT Auto-ID Center, 2003.