[Show abstract][Hide abstract] ABSTRACT: Peer-to-Peer (P2P) systems are becoming increasingly popular as they enable users to exchange digital information by participating in complex networks. Such systems are inexpensive, easy to use, highly scalable and do not require central administration. Despite their advantages, however, limited work has been done on employing database systems on top of P2P networks.
[Show abstract][Hide abstract] ABSTRACT: The dimensionality curse has greatly affected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance information when the data set is not globally correlated. To reduce loss of information and degradation of search quality, cluster based dimensionality reduction should be used instead. In this paper, we present an adaptive local dimensionality reduction (LDR) technique which first identifies effective clusters based on Mahalanobis distance, and for each cluster, performs local dimensionality reduction. The data points in each cluster of the reduced-dimensionality space are then transformed into single distance values with reference to the centroid of the cluster, and indexed using a single dimensional index for nearest neighbor search. Unlike an existing LDR technique which uses an index for each cluster, we use one single B + -tree for the whole data set. Extensive performance studies using both real and synthetic data show that the method achieves higher precision compared to existing global dimensionality reduction and local dimensionality reduction methods, and is more efficient in terms of query performance. 1
[Show abstract][Hide abstract] ABSTRACT: INTRODUCTION In this demonstration, we present a system designed to find semantically relevant images that are embedded in HTML documents in the WWW. The system has been implemented in Java on a Sun Sparc machine, and our experimental study showed the effectiveness of the system . 2. IMAGE REPRESENTATION MODEL The main observation that we made is that an embedded image's semantics are typically captured by its surrounding text in the document. We have identified four parts of the textual content that are well related to the embedded image. These are the image title, image ALT (alternate text), image caption and page title. To represent the image semantics more adequately, we propose the Weight ChainNet model that is based on the concept of lexical chain. A lexical chain (LC) is a sequence of semantically related words in a text. Here, we define it as one sentence that carries certain semantics by its words.
[Show abstract][Hide abstract] ABSTRACT: The storage structure of videos on disks affects the number of concurrent users a video- on-demand system can support and hence the average waiting time. In this paper, we propose a phase-based striping method which has the desirable characteristics that it guarantees the maximum waiting time. We also develop a data replication scheme to further reduce the average waiting time of the phase-based method. These two schemes, together with the conventional sequential striping scheme, are then employed to optimize the waiting time of videos based on their access pattern. Simulations were conducted using a video server containing 36 videos under different loading and hardware configurations. The results show that such optimization can reduce the waiting time significantly. The use of replication method could further improve the server performance by over 25%. The overall results indicate that with limited resources, the use of phase-based striping method with replication is preferable under heavy loads. Keywords: Video-on-demand, disk striping, replication.
[Show abstract][Hide abstract] ABSTRACT: A n umber of indexing structures for temporal databases have been proposed in the past few years. Many h a ve been proposed without much comparison with other indexing structures. In this paper, we present an extensive comparative performance evaluation of the Time Split B-tree (TSB-tree), Append-Only tree (AP-tree), R-tree, and Time-Polygon tree (TP-tree). For the R-tree structure, we examined two w ays of representing temporal data. We also extended the TP-tree to cater to key-range time-slice queries. We implemented these indexes, and evaluated them on a wide range of indexing attributes, queries and large data sets. The indexes are constructed on time-invariant k ey and transaction time, time-invariant k ey and valid time, and time-varying key and valid time. The TI-Time software from the Time Center (University o f Arizona) is extended to generate large data sets with diierent data distributions of one million versions. Temporal queries such as time-slice, key-range time-slice, and past versions queries are used to benchmark the eeciency of the indexes. Our results provide insights into the strengths and weaknesses of these indexes. With the proliferation of temporal indexes and the lack o f extensive performance study, the experimental results are important to serve as guidelines for selection of a suitable index and the design of a new index.
[Show abstract][Hide abstract] ABSTRACT: Skyline queries are often used on data sets in multi-dimensional space for many decision-making applications. Traditionally, a point p is said to dominate another point q if, for all dimension, it is no worse than q and is better on at least one dimension. Therefore, the skyline of a data set consists of all points not dominated by any other point. To better cater to application requirements such as controlling the size of the skyline or handling data sets that are not well-structured, various works have been proposed to extend the definition of skyline based on variants of the dominance relationship. However, it is difficult to implement each of these variants separately in a system setting and instead effort must be made to provide a general framework so that these specific implementations can be easily materialized over the framework. In this paper, a generalized framework is proposed for this purpose. Our framework explicitly and care-fully examines the various properties that should be preserved in a variant of the dominance relationship so that: (1) the original advantages of skyline can be maintained while adaptivity to application semantics is also catered to and (2) computational complexity is al-most unaffected. We prove that traditional dominance is the only relationship satisfying all desirable proper-ties and present some new dominance relationships to illustrate that other skyline variants always have their tradeoff in relaxing some of the properties. We then de-veloped generic algorithms that compute skyline vari-ants subject to the constraints that certain properties are relaxed and illustrate the use of our framework in computing of skyline over datasets with missing values. Extensive experimental results are presented to evalu-ate the efficiency and effectiveness of our framework.