We propose a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. In contrast to piecewise constant approximation (PCA) techniques that approximate each time series with constant value segments, the proposed method--Piecewise Vector Quantized Approximation--uses the closest (based on a distance measure) codeword from a codebook of key-sequences to represent each segment. The new representation is symbolic and it allows for the application of text-based retrieval techniques into time series similarity analysis. Experiments on real and simulated datasets show that the proposed technique generally outperforms PCA techniques in clustering and similarity searches.
When an organization embarks on e-commerce it rarely has a chance to re-engineer its existing business applications. However, if these business applications were built using an application framework, then one might hope to reuse many of the existing legacy applications in the new e-commerce context. This paper examines the general issues created by migrating applications to e-commerce, and proposes an architecture for application frameworks that must support e-commerce
Generalizes the optimized support association rule problem by permitting rules to contain disjunctions over uninstantiated numeric attributes. For rules containing a single numeric attribute, we present a dynamic programming algorithm for computing optimized association rules. Furthermore, we propose a bucketing technique for reducing the input size, and a divide-and-conquer strategy that improves the performance significantly without sacrificing optimality. Our experimental results for a single numeric attribute indicate that our bucketing and divide-and-conquer enhancements are very effective in reducing the execution times and memory requirements of our dynamic programming algorithm. Furthermore, they show that our algorithms scale up almost linearly with the attribute's domain size as well as with the number of disjunctions
Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN [Ester, M. et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In E. Simoudis, J. Han, & U. M. Fayyad (Eds.), Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 226-231). Portland, OR: AAAI.] using a global density parameter is not suitable. As an improvement, OPTICS [Ankerst, M. et al,(1999). OPTICS: Ordering Points To Identify the Clustering Structure. In A. Delis, C. Faloutsos, & S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 49-60). Philadelphia, PA: ACM.] creates an augmented ordering of the database representing its density-based clustering structure, but it only generates the clusters whose local-density exceeds some threshold instead of similar local-density clusters and doesn't produce a clustering of a data set explicitly. Furthermore the parameters required by almost all the well-known clustering algorithms are hard to determine but have a significant influence on the clustering result. In this paper, a new clustering algorithm LDBSCAN relying on a local-density-based notion of clusters is proposed to solve those problems and, what is more, it is very easy for us to pick the appropriate parameters and takes the advantage of the LOF [Breunig, M. M., et al.,(2000). LOF: Identifying Density-Based Local Outliers. In W. Chen, J. F. Naughton, & P. A. Bernstein (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 93-104). Dalles, TX: ACM.] to detect the noises comparing with other density-based clustering algorithms. The proposed algorithm has potential applications in business intelligence and enterprise information systems.
Business process models are an important tool in understanding and improving the efficiency of a business and in the design of information systems. Recent work has evaluated business process modelling languages against upper-level ontologies on the assumption that these ontologies are adequate representations of the general process domain. In this paper, we present a method to test this assumption. Our method is based on principles of cognitive psychology and demonstrated using the BWW and SUMO upper-level ontologies.
In this paper a method is proposed to be used as the first step in the ontology construction process. This method, specially tailored to ontology construction for knowledge management applications, is based on the use of concept maps as a mean of expression for the expert, followed by an application that assists in the capture of the expert intention with the goal of further formalizing the map. This application analyses the concept map, taking into account the map topology and key words used by the expert. From this analysis a series of questions is presented to the expert that, when answered, reduce the map ambiguity and identify some common patterns in ontological representations, such as generalizations and mere logic relations. This information could then be used by the knowledge engineer during further knowledge acquisition sessions or to direct the expert to a further formalization or improvement of the map. The method was tested in a group of volunteers, all of them engineers working at the aerospace sector, and the results suggest that both the use of conceptual mapping as well as the intention capture step are acceptable from the point of view of the end user, supporting the claim that this method is viable as an option to reduce some of the difficulties in large scale ontology construction.
An overview of a knowledge-based document preparation system,
REGENT (report generation tool), is presented. REGENT is a software
environment which generates documents from reusable document pieces by
planning, executing, and monitoring the document preparation process in
an organizational setting. The organizational aspects of the document
generation process are incorporated in the system architecture. The
documents are constructed from stored document pieces using artificial
intelligence methods. The report preparation process is detailed as to
the knowledge representation structure and the problem solving strategy
The purpose of this work is to combine the advantages of using visual formalisms for the specification of reactive systems with that of using formal verification and program transformation tools developed for textual formalisms. We have developed a tool suite called ViSta that automatically produces statechart layouts based on information extracted from an informal specification. In this paper, we discuss how ViSta is augmented with a tool that automatically translates statecharts to Z specifications. The informal, statechart and Z specifications are inter-related. This ensures consistency between the different representations, and therefore facilitates the verification and validation effort
Data fusion in information retrieval has been investigated by many researchers and quite a few data fusion methods have been proposed, but why data fusion can bring improvement in effectiveness is still not very clear. In this paper, we use a geometric probabilistic framework to formally describe data fusion, in which each component result returned from an information retrieval system for a given query is represented as a point in a multiple dimensional space. Then all the component results and data fusion results can be explained using geometrical principles. In such a framework, it becomes clear why quite often data fusion can bring improvement in effectiveness and accordingly what the favourable conditions are for data fusion algorithms to achieve better results. The framework can be used as a guideline to make data fusion techniques be used more effectively.
We study the problem of finding frequent itemsets in a continuous stream of transactions. The current frequency of an itemset in a stream is defined as its maximal frequency over all possible windows in the stream from any point in the past until the current state that satisfy a minimal length constraint. Properties of this new measure are studied and an incremental algorithm that allows, at any time, to immediately produce the current frequencies of all frequent itemsets is proposed. Experimental and theoretical analysis show that the space requirements for the algorithm are extremely small for many realistic data distributions.
In knowledge systems, pieces of information (evidence, hypotheses, attributes, terms, documents, rules) are usually assumed to carry equal importance and to be independent of each other, although it might not actually be the case. Issues for a logic of weighted queries, with possibility of also weighting documents and logical connectors (in terms of intelligent retrieval, for example) are presented here, using “min” or t-norms, and soft operators involving p-norms. This logic cannot be a conventional one for, when introducing relative importance between concepts, definitions are different for ANDed and ORed weighted queries. A concept of “nought”, a limit case of no-importance queries, and its behaviour with fuzzy sets operations is developed, in particular the notion of an extended membership is introduced. Finally it is shown, with a biomedical example, how to combine importance with soft matching in rule-based systems.
The foundation of a process model lies in its structural specifications. Using a generic process modeling language for workflows, we show how a structural specification may contain deadlock and lack of synchronization conflicts that could compromise the correct execution of workflows. In general, identification of such conflicts is a computationally complex problem and requires development of effective algorithms specific for the target modeling language. We present a visual verification approach and algorithm that employs a set of graph reduction rules to identify structural conflicts in process models for the given workflow modeling language. We also provide insights into the correctness and complexity of the reduction process. Finally, we show how the reduction algorithm may be used to count possible instance subgraphs of a correct process model. The main contribution of the paper is a new technique for satisfying well-defined correctness criteria in process models.
Current business challenges such as deregulation, mergers, globalisation and increased competition have given rise to a new process-centric philosophy of business management. The key issue in this paradigm is the concept of business process. From a methodological perspective, this movement has resulted in a considerable number of approaches that encourage the modelling of business processes as a key component of any improvement or re-engineering endeavour. However, there is a considerable controversy amongst all these competing approaches about the most appropriate way for identifying the types and number of relevant processes. Existing business process modelling approaches describe an enterprise in terms of activities and tasks without offering sufficient guidance towards a process-centred description of the organisation.In this paper we advocate the use of a goal-driven approach to business process modelling. A systematic approach to developing and documenting business processes on the basis of the explicit or implicit business objectives is put forward. We argue that such an approach should lead to a closer alignment between the intentional and operational aspects of an organisation. Our approach is exemplified through the use of parts of a large industrial application that is currently making use of a goal-driven business process modelling.
There is growing interest in abandoning the first-normal-form assumption on which the relational database model is based. This interest has developed from a desire to extend the applicability of the relational model beyond traditional data-processing application. In this paper, we extend one of the most widely used relational query languages, SQL, to operate on non-first-normal-form relations. In this framework, well allow attributes to be relation-valued as well as atomic-valued (e.g. integer or character). A relation which occurs as the value of an attribute in a tuple of another relation is said to be nested. Our extended language, called SQL/NF, includes all of the power of standard SQL as well as the ability to define nested relations in the data definition language and query these relations directly in the extended data manipulation language. A variety of improvements are made to SQL; the syntax is simplified and useless constructs and arbitrary restrictions are removed.
Some basic concepts concerning information systems are defined and investigated. With every information system a query language is associated and its syntax and semantics is formally defined. Some elementary properties of the query language are stated. The presented approach leads to a new information systems organization. The presented idea was implemented and the implementation shows many advantages compared with other methods.
In general, a probabilistic knowledge base consists of a joint probability distribution on discrete random variables. Though it allows of easy computability and efficient propagation methods, its inherent knowledge is hardly accessible for the user. The concept introduced in this paper permits an interactive communication between man and machine by use of probabilistic logic: The user is able to convey all know-how available to the system, and conversely, knowledge embodied by the distribution is revealed in an understandable way. Uncertain rules constitute the link between commonsense and probabilistic knowledge representation. The concept developed in this paper is partly realized in the probabilistic expert system shell SPIRIT. An application of SPIRIT to a real life example is described in the appendix.
Ontologies are the backbone of the Semantic Web, a semantic-aware version of the World Wide Web. The availability of large-scale high quality domain ontologies depends on effective and usable methodologies aimed at supporting the crucial process of ontology building. Ontology building exhibits a structural and logical complexity that is comparable to the production of software artefacts. This paper proposes an ontology building methodology that capitalizes the large experience drawn from a widely used standard in software engineering: the Unified Software Development Process or Unified Process (UP). In particular, we propose UP for ONtology (UPON) building, a methodology for ontology building derived from the UP. UPON is presented with the support of a practical example in the eBusiness domain. A comparative evaluation with other methodologies and the results of its adoption in the context of the Athena EU Integrated Project are also discussed.
Contemporary information systems (e.g., WfM, ERP, CRM, SCM, and B2B systems) record business events in so-called event logs. Business process mining takes these logs to discover process, control, data, organizational, and social structures. Although many researchers are developing new and more powerful process mining techniques and software vendors are incorporating these in their software, few of the more advanced process mining techniques have been tested on real-life processes. This paper describes the application of process mining in one of the provincial offices of the Dutch National Public Works Department, responsible for the construction and maintenance of the road and water infrastructure. Using a variety of process mining techniques, we analyzed the processing of invoices sent by the various subcontractors and suppliers from three different perspectives: (1) the process perspective, (2) the organizational perspective, and (3) the case perspective. For this purpose, we used some of the tools developed in the context of the ProM framework. The goal of this paper is to demonstrate the applicability of process mining in general and our algorithms and tools in particular.
Process modeling has gained prominence in the information systems modeling area due to its focus on business processes and its usefulness in such business improvement methodologies as Total Quality Management, Business Process Reengineering, and Workflow Management. However, process modeling techniques are not without their criticisms . This paper proposes and uses the Bunge-Wand-Weber (BWW) representation model to analyze the five views — process, data, function, organization and output — provided in the Architecture of Integrated Information Systems (ARIS) popularized by Scheer [39, 40, 41]. The BWW representation model attempts to provide a theoretical base on which to evaluate and thus contribute to the improvement of information systems modeling techniques. The analysis conducted in this paper prompts some propositions. It confirms that the process view alone is not sufficient to model all the real-world constructs required. Some other symbols or views are needed to overcome these deficiencies. However, even when considering all five views in combination, problems may arise in representing all potentially required business rules, specifying the scope and boundaries of the system under consideration, and employing a “top-down” approach to analysis and design. Further work from this study will involve the operationalization of these propositions and their empirical testing in the field.
Given that most elementary problems in database design are NP-hard, the currently used database design algorithms produce suboptimal results. For example, the current 3NF decomposition algorithms may continue further decomposing a relation even though it is already in 3NF. In this paper we study database design problems whose sets of functional dependencies have bounded treewidth. For such sets, we develop polynomial-time and highly parallelizable algorithms for a number of central database design problems such as:•primality of an attribute;•3NF-test for a relational schema or subschema;•BCNF-test for a subschema.In order to define the treewidth of a relational schema, we shall associate a hypergraph with it. Note that there are two main possibilities of defining the treewidth of a hypergraph H: One is via the primal graph of H and one is via the incidence graph of H. Our algorithms apply to the case where the primal graph is considered. However, we also show that the tractability results still hold when the incidence graph is considered instead.It turns out that our results have interesting applications to logic-based abduction. By the well-known relationship with the primality problem in database design and the relevance problem in propositional abduction, our new algorithms and tractability results can be easily carried over from the former field to the latter. Moreover, we show how these tractability results can be further extended from propositional abduction to abductive diagnosis based on non-ground datalog.
In this paper we propose a logical data model for complex data. Our proposal extends the relational model by using abstract data types for domains specification and an extended relational algebra is also introduced. The introduction of the parameterized type Geometry(S), where S is a ground set of elements, allows the representation of complex aggregated data. As an example, we discuss how our model supports the definition of geographical DBMSs. Moreover, to show the generality of our approach, we sketch how the model can be used in the framework of statistical applications.
The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this paper, we study six predictors of query performance, which can be generated prior to the retrieval process without the use of relevance scores. As a consequence, the cost of computing these predictors is marginal. The linear and non-parametric correlations of the proposed predictors with query performance are thoroughly assessed on the Text REtrieval Conference (TREC) disk4 and disk5 (minus CR) collection with the 249 TREC topics that were used in the recent TREC2004 Robust Track. According to the results, some of the proposed predictors have significant correlation with query performance, showing that these predictors can be useful to infer query performance in practical applications.
In information technology, models are abstract devices to represent the components and functions of software applications. When a model is general and consistent, it represents a useful design tool to unambiguously describe the application. Traditional models are not suitable for the design of hypermedia systems and, therefore, specific design models and methodologies are needed. In the present article, the requirements for such models are analysed, an overview of the characteristics of the existing models for hypermedia applications is made and an abstract model fulfilling the analysed requirements is presented. The model, called Labyrinth, allows 1) the design of platform-independent hypermedia applications; 2) the categorisation, generalisation and abstraction of sparse unstructured heterogeneous information in multiple and interconnected levels; 3) the creation of personalisations (personal views) in multiuser hyperdocuments for both groups and individual users and 4) the design of advanced security mechanisms for hypermedia applications.
The notion of context appears in computer science, as well as in several other disciplines, in various forms. In this paper, we present a general framework for representing the notion of context in information modeling. First, we define a context as a set of objects, within which each object has a set of names and possibly a reference: the reference of the object is another context which “hides” detailed information about the object. Then, we introduce the possibility of structuring the contents of a context through the traditional abstraction mechanisms, i.e., classification, generalization, and attribution. We show that, depending on the application, our notion of context can be used as an independent abstraction mechanism, either in an alternative or a complementary capacity with respect to the traditional abstraction mechanisms. We also study the interactions between contextualization and the traditional abstraction mechanisms, as well as the constraints that govern such interactions. Finally, we present a theory for contextualized information bases. The theory includes a set of validity constraints, a model theory, as well as a set of sound and complete inference rules. We show that our core theory can be easily extended to support embedding of particular information models in our contextualization framework.
Three prevalent abstractions in temporal information are examined by using the machinery of first order logic. The abstraction of time allows one to concentrate on temporal objects only as they relate to other temporal objects in time. It is represented by a functional relationship between temporal objects and time intervals. The abstraction of identity allows one to concentrate on how an observed phenomenon relates to other phenomena in terms of their being manifestations of the same object. It is represented by a functional relationship between temporal phenomena and “completed” temporal objects. The abstraction of circumstance embodies a focus of attention on particular configurations or states of groups of temporal phenomena. It is represented by functional relationships between thesis groups and other objects called “events” or “states”.A novel concept, called absolute/relative abstraction, is used to formalize the abstractions of time and identity. The abstraction of circumstance, on the other hand, is an example of aggregation. The significance and use of thesis abstractions in the representation and processing of historical information is discussed.
Lack of support for Entity-Relationship (E-R) semantics and the disconnect between object-oriented programming languages (OOPLs) and database languages remain key roadblocks to the effective use of object-orientation in information systems development. We present SOODAS, a Semantic Object-Oriented Data Access System that defines and manages the meta-data necessary to support E-R semantics and set level querying and provides related interface generation tools. SOODAS consists of five meta-classes. DomainObject and Relationship provide the capabilities needed to define and manage entities, attributes, relationships, external identifiers, and constraints. Together with the meta-class QueryNode, DomainObject provides an object-oriented, multi-entity querying capability. Queries can be arbitrarily complex and can include cycles and transitive closure. Persistence is provided by the meta-class, PermanentObject, of which DomainObject and Relationship are subclasses. The meta-class, DomainObjectInterface uses the meta-data in DomainObject and Relationship to generate a standard, re-usable interface for displaying and maintaining instances of any entity. Since SOODAS is implemented entirely in Smalltalk, it can be seamlessly integrated with any Smalltalk application.
In this paper we investigate the manipulation of large sets of 2-dimensional data representing multiple overlapping features (e.g. semantically distinct overlays of a given region), and we present a new access method, the MOF-tree. We perform an analysis with respect to the storage requirements and a time analysis with respect to window query operations involving multiple features (e.g. to verify if a constraint defined on multiple overlays holds or not inside a certain region). We examine both the pointer-based as well as the pointerless MOF-tree representations, using as space complexity measure the number of bits used in main memory and the number of disk pages in secondary storage respectively. In particular, we show that the new structure is space competitive in the average case, both in the pointer version and in the linear version, with respect to multiple instances of a region quadtree and a linear quadtree respectively, where each instance represents a single feature. Concerning the time performance of the new structure, we analyze the class of window (range) queries, posed on the secondary memory implementation. We show that the I/O worst-case time complexity for processing a number of window queries in the given image space, is competitive with respect to multiple instances of a linear quadtree, as confirmed by experimental results. Finally, we show that the MOF-tree can efficiently support spatial join processing in a spatial DBMS.
We present an access method for timeslice queries that reconstructs a past state s(t) of a time-evolving collection of objects, in () I/O's, where ¦s(t)¦ denotes the size of the collection at time t, n is the total number of changes in the collection's evolution and b is the size of an I/O transfer. Changes include the addition, deletion or attribute modification of objects; they are assumed to occur in increasing time order and always affect the most current state of the collection (thus our index supports transaction-time.) The space used is () while the update processing is constant per change, i.e., independent of n. This is the first I/O-optimal access method for this problem using () space and (1) updating (in the expected amortized sense due to the use of hashing.) This performance is also achieved for interval intersection temporal queries. An advantage of our approach is that its performance can be tuned to match particular application needs (trading space for query time and vice versa). In addition, the Snapshot Index can naturally migrate data on a write-once optical medium while maintaining the same performance bounds.
An analytic model is developed to integrate two closely related subproblems of physical database design: record segmentation and access path selection. Several restrictive assumptions of the past research on record segmentation, e.g. a single access method and the dominance of one subfile over the other, are relaxed in this model. A generic design process for this integrated performance model is suggested and applied to a relational database. A heuristic procedure and an optimal algorithm are developed for solving the model. Extensive computational results are reported to show the effectiveness of these solution techniques.
In this work access support relations are introduced as a means for optimizing query processing in object-oriented database systems. The general idea is to maintain separate structures (dissociated from the object representation) to redundantly store those object references that are frequently traversed in database queries. The proposed access support relation technique is no longer restricted to relate an object (tuple) to an atomic value (attribute value) as in conventional indexing. Rather, access support relations relate objects with each other and can span over reference chains which may contain collection-valued components in order to support queries involving path expressions. We present several alternative extensions and decompositions of access support relations for a given path expression, the best of which has to be determined according to the application-specific database usage profile. An analytical performance analysis of access support relations is developed. This analytical cost model is, in particular, used to determine the best access support relation extension and decomposition with respect to specific database configuration and usage characteristics.
An important issue for the success of a database application is the effectiveness of its interface. Frequently a relevant part of the programming effort is devoted to the generation of interfaces. The visual programming environments reduce only partly this effort, and in particular, things become more complicated when data coming from different sources (different views in the same database or even views from different databases or systems) are to be related and must cooperate in the data navigation and manipulation task. To overcome this problem we present a new database access paradigm based on an algebra on the domain of computational abstractions called “services” which include both dimensions: the data access computation and the user interaction. This means that the interaction is not implemented by using separated constructs as happens for traditional computational models; on the contrary, as the interaction is an integral part of the service paradigm, the user interaction is computed starting from the declarative specification of the data access itself. The combination of services in a service expression through the operators defined by the service algebra makes it possible to generate cooperating user interfaces for complex data navigation and manipulation. Through algebraic properties, which hold both from the data and user interface point of view, the service expressions can be simplified and optimized guaranteeing their initial semantics. The paper shows the application of the service algebra to the relational environment by means of a simple extension to SQL. Finally, the paper describes a tool based on a three tier architecture and on Java technology for developing and distributing services in Web environment. Services and combination of services expressed with the service algebra are automatically translated into Java objects, allowing the rapid development of platform independent data access services.
A cryptographic key assignment scheme for access control in a user hierarchy is proposed. The users and their owned information items are classified into disjoint sets of security classes where the hierarchy on security classes is an arbitrary partial order. Based on Newton's interpolation method and a predefined one-way function, each security class Ci is assigned a secret key SKi and some public parameters (P1i, P2i). The information items owned by the security class Ci are encrypted by an available symmetric cryptosystem with the enciphering key SKi. Through the computing of the assigned security key and the public parameters, only the security class in the higher level can derive any of his successors' secret keys. Thus, only the security classes in the higher level can access the information items owned by the security classes in the lower level. We also show that our proposed scheme is not only secure but also practical.
The performance of access methods and the underlying disk system is a significant factor in determining the performance of database applications, especially with large sets of data. While modern hard disks are manufactured with multiple physical zones, where seek times and data transfer rates vary significantly across the zones, there has been little consideration of this important disk characteristic in designing access methods (indexing schemes). Instead, conventional access methods have been developed based on a traditional disk model that comes with many simplifying assumptions such as an average seek time and a single data transfer rate. The paper proposes novel partitioning techniques that can be applied to any tree-like access methods, both dynamic and static, fully utilizing zoning characteristics of hard disks. The index pages are allocated to disk zones in such a way that more frequently accessed index pages are stored in a faster disk zone. On top of the zoned data placement, a localized query processing technique is proposed to significantly improve the query performance by reducing page retrieval times from the hard disk.
A generalized model for the performance optimization of physical relational database access has been developed and implemented. The model consists of a set of algorithms and cost equations. It assists the database designer in specifying and selecting optimal access schemes within given systems' constraints. The model is an extension of previous work. It is more comprehensive and flexible. It addresses problem that have not been considered in previous models; it integrates into one model aspects that were treated individually before, and it produces database access configurations that can work within certain given system constraints.
There are many information objects and users in a large company. It is an important issue how to control user's access in order that only authorized user can access information objects. Traditional access control models—discretionary access control, mandatory access control, and role-based access control—do not properly reflect the characteristics of enterprise environment. This paper proposes an improved access control model for enterprise environment. The characteristics of access control in an enterprise environment are examined and a task–role-based access control (T–RBAC) model founded on concept of classification of tasks is introduced. Task is a fundamental unit of business work or business activity. T–RBAC deals with each task differently according to its class, and supports task level access control and supervision role hierarchy. T–RBAC is a suitable access control model for industrial companies.
Real-time update of access control policies, that is, updating policies while they are in effect and enforcing the changes immediately and automatically, is necessary for many dynamic environments. Examples of such environments include disaster relief and war zone. In such situations, system resources may need re-configuration or operational modes may change, necessitating a change of policies. For the system to continue functioning, the policies must be changed immediately and the modified policies automatically enforced. In this paper, we propose a solution to this problem—we consider real-time update of access control policies in the context of a database system.In our model, a database consists of a set of objects that are read and updated through transactions. Access to the data objects are controlled by access control policies which are stored in the form of policy objects. We consider an environment in which different kinds of transactions execute concurrently; some of these may be transactions updating policy objects. Updating policy objects while they are deployed can lead to potential security problems. We propose algorithms that not only prevent such security problems, but also ensure serializable execution of transactions. The algorithms differ on the degree of concurrency provided and the kinds of policies each can update.
A computational algorithm with the aim of reducing access costs in database and file system applications is presented. The idea being developed is to determine the optimal number of contiguous data blocks, that is the multiblocking factor, to be transferred to memory in a single access. This choice is not effected for each file or relation, but a multiblocking factor will be selected independently for each index of a relation or of a file. The determination of different values is effected in order to increase the percentage of useful information transferred during each access and, therefore, to decrease the total number of I/O operations. The effectiveness of the method is shown by the experimental results obtained using an actual database. The selection criterion of the multiblocking factor associated with an index is based on the measurement of the average clustering of the key value occurrences in the stored records.
Nowadays, people frequently use different keyword-based web search engines to find the information they need on the web. However, many words are polysemous and, when these words are used to query a search engine, its output usually includes links to web pages referring to their different meanings. Besides, results with different meanings are mixed up, which makes the task of finding the relevant information difficult for the users, especially if the user-intended meanings behind the input keywords are not among the most popular on the web.In this paper, we propose a set of semantics techniques to group the results provided by a traditional search engine into categories defined by the different meanings of the input keywords. Differently from other proposals, our method considers the knowledge provided by ontologies available on the web in order to dynamically define the possible categories. Thus, it is independent of the sources providing the results that must be grouped. Our experimental results show the interest of the proposal.
A physcial database system design should take account of skewed block access distributions, nonuniformly distributed attribute domains, and dependent attributes. In this paper we drive general formulas for the number of blocks accessed under these assumptions by considering a class of related occupancy problems. We then proceed to develop robust and accurate approximations for these formulas. We investigate three classes of approximation methods, respectively based on generating functions, Taylor series expansions, and majorization. These approximations are simple to use and more accurate than the cost estimate formulas generated by making uniformity and independence assumptions. Thus they are more representative of the actual database environment, and can be utilized by a query optimizer for better performance.
FORAL is a data base language designed specifically for the access to binary semantic network structures at the DIAM II Infological Level. It is a non-procedural language, that has some of the characteristics of natural English. The user writes transactions in terms of real world things and their attributes rather than in terms of fields, records, and files. A review of the original FORAL led to FORAL II described in this paper, offering a more readable syntax, in particular for long documents.
The integration of heterogeneous databases affects two main problems: schema integration and instance integration. At both levels a mapping from local elements to global elements is specified and various conflicts caused by the heterogeneity of the sources have to be resolved. For the detection and resolution of instance-level conflicts we propose an interactive, example-driven approach. The basic idea is to combine an interactive query tool similar to query-by-example with facilities for defining and applying integration operations. This integration approach is supported by a multidatabase query language, which provides special mechanisms for conflict resolution. The foundations of these mechanisms are introduced and their usage in instance integration and reconciliation is presented. In addition, we discuss basic techniques for supporting the detection of instance-level conflicts.
Documents are co-derivative if they share content: for two documents to be co-derived, some portion of one must be derived from the other, or some portion of both must be derived from a third document. An existing technique for concurrently detecting all co-derivatives in a collection is document fingerprinting, which matches documents based on the hash values of selected document subsequences, or chunks. Fingerprinting is hampered by an inability to accurately isolate information that is useful in identifying co-derivatives. In this paper we present spex, a novel hash-based algorithm for extracting duplicated chunks from a document collection. We discuss how information about shared chunks can be used for efficiently and reliably identifying co-derivative clusters, and describe deco, a prototype package that combines the spex algorithm with other optimisations and compressed indexing to produce a flexible and scalable co-derivative discovery system. Our experiments with multi-gigabyte document collections demonstrate the effectiveness of the approach.
The role of IT in achieving the organisation's strategic development goals has been an area of constant debate. This paper describes the experiences of John Nicholls Builders Ltd, a Cornish building firm, in their attempt to achieve their strategic development goals through the adoption of IT. The implementation stage of the project involved setting a flexible programme and timescale from the start. The company adopted a bottom up approach whereby potential users were consulted and involved in the process. Also the support of top management staff was crucial for the successful transition to the new system. Although there was no single ready-made solution that could fit the organisations requirements, they were able to identify appropriate construction industry software packages and integrate them through development of an intranet and database system. Now, there is greater management control, all departments have greater access to information, enabling them to function more effectively and efficiently, and since projections are more accurate or now available, management can make long-term strategic plans. These improvements and developments to the business system have improved operational efficiency, turnover and profitability of the organisation.
Biodiversity research requires associating data about living beings and their habitats, constructing sophisticated models and correlating all kinds of information. Data handled are inherently heterogeneous, being provided by distinct (and distributed) research groups, which collect these data using different vocabularies, assumptions, methodologies and goals, and under varying spatio-temporal frames. Ontologies are being adopted as one of the means to alleviate these heterogeneity problems, thus helping cooperation among researchers. While ontology toolkits offer a wide range of operations, they are self-contained and cannot be accessed by external applications. Thus, the many proposals for adopting ontologies to enhance interoperability in application development are either based on the use of ontology servers or of ontology frameworks. The latter support many functions, but impose application recoding whenever ontologies change, whereas the first supports ontology evolution, but for a limited set of functions.This paper presents Aondê—a Web service geared towards the biodiversity domain that combines the advantages of frameworks and servers, supporting ontology sharing and management on the Web. By clearly separating storage concerns from semantic issues, the service provides independence between ontology evolution and the applications that need them. The service provides a wide range of basic operations to create, store, manage, analyze and integrate multiple ontologies. These operations can be repeatedly invoked by client applications to construct more complex manipulations. Aondê has been validated for real biodiversity case studies.
An important role of an information system is to provide a representation of a Universe of Discourse, which reflects its structure and behaviour. An equally important function of the system is to support communication within an organisation by structuring and coordinating the actions performed by the organisation's agents. In many systems development methods, these different roles that an information system assumes are not explicitly separated. Representation techniques appropriate for one role are uncritically applied to another. In this paper, we propose a unifying framework based on speech act theory, which reconciliates the representation and communication roles of information systems. In particular, we show how communication can be modelled by means of discourses, which are viewed as sequences of events.
The Unified Modelling Language (UML) lacks precise and formal foundations and semantics for several modeling constructs, such as transition guards or method bodies. These semantic discrepancies and loopholes prevent executability, making early testing and validation out of reach of UML tools. Furthermore, the semantic gap from high-level UML concepts to low-level programming constructs found in traditional object-oriented language prevents the development of efficient code generators.The recent Action Semantics (AS) proposal tackles these problems by extending the UML with yet another formalism for describing behavior, but with a strong emphasis on dynamic semantics. This formalism provides both, a metamodel integrated into the UML metamodel, and a model of execution for these statements. As a future OMG standard, the AS eases the move to tool interoperability, and allows for executable modeling and simulation.We explore in this paper a specificity of the AS: its applicability to the UML metamodel, itself a UML model. We show how this approach paves the way for powerful metaprogramming for model transformation. Furthermore, the overhead for designers is minimal, as mappings from usual object-oriented languages to the AS will be standardized.
Creation and adaptation of workflows is a difficult and costly task that is currently performed by human workflow modeling
experts. Our paper describes a new approach for the automatic adaptation of workflows, which makes use of a case base of former
workflow adaptations. We propose a general framework for case-based adaptation of workflows and then focus on novel methods
to represent and reuse previous adaptation episodes for workflows. An empirical evaluation demonstrates the feasibility of
the approach and provides valuable insights for future research.
Integration of data sources opens up possibilities for new and valuable applications of data that cannot be supported by the individual sources alone. Unfortunately, many data integration projects are hindered by the inherent heterogeneities in the sources to be integrated. In particular, differences in the way that real world data is encoded within sources can cause a range of difficulties, not least of which is that the conflicting semantics may not be recognised until the integration project is well under way. Once identified, semantic conflicts of this kind are typically dealt with by configuring a data transformation engine, that can convert incoming data into the form required by the integrated system. However, determination of a complete and consistent set of data transformations for any given integration task is far from trivial. In this paper, we explore the potential application of techniques for integrity enforcement in supporting this process. We describe the design of a data reconciliation tool (LITCHI) based on these techniques that aims to assist taxonomists in the integration of biodiversity data sets. Our experiences have highlighted several limitations of integrity enforcement when applied to this real world problem, and we describe how we have overcome these in the design of our system.
Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach.
Due to its flexibility, XML is becoming the de facto standard for exchanging and querying documents over the Web. Many XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured XML data. Without a structural summary and efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. To overcome the inefficiency, several path indexes have been proposed in the research community. Traditional indexes generally record all label paths from the root element in XML data and are constructed with the use of data only. Such path indexes may result in performance degradation due to large sizes and exhaustive navigations for partial matching path queries which start with the self-or-descendent axis(“//”). To improve the query performance, we propose an adaptive path index for XML data (termed APEX). APEX does not keep all paths starting from the root and utilizes frequently used paths on query workloads. APEX also has a nice property that it can be updated incrementally according to the changes of query workloads. Experimental results with synthetic and real-life data sets clearly confirm that APEX improves the query processing cost typically 2–69 times compared with the traditional indexes, with the performance gap increasing with the irregularity of XML data.