Table 2 - uploaded by Ahmed Elfatatry
Content may be subject to copyright.
Source publication
Searching large XML repositories is a challenging research problem. The application of clustering on a large repository before performing a search enhances the search process significantly. Clustering reduces a search space into smaller XML collections that can be better searched. In this work, we present an enhanced XML clustering by structure met...
Context in source publication
Context 1
... n r i is the number of documents in the generated cluster C i belonging to actual modelled category, n r is the number of XML documents belong to the modelled category, and n i is the number of XML documents categorized by the clus- tering algorithm. The result of the accuracy comparison between EXCLS, XCLS and XEdge for the first experiment is shown in Table 2. The input parameter for XCLS algorithm is threshold value 0.9 and the input parameter for XEdge is the number of clusters (k), which equals 4. It is evident that XCLS failed to assign the documents to their clusters cor- rectly, while EXCLS and XEdge did a perfect job and clustered the documents correctly. ...
Similar publications
Music is a temporal organization of sounds, and we can therefore assume that any music representation has a structure that reflects some conceptual principles. This structure is hardly explicitly accessible in many encodings, such as, for instance, audio files. However, it appears much more clearly in the language of music notation.
We propose to u...
XML has become an important medium for data exchange, and is frequently used as an interface to – i.e. a view of – a relational database. Although much attention has been paid to the problem of querying relational databases through XML views, the problem of updating relational databases through XML views has not been addressed. In this paper we inv...
The cityEHR is an example of an open source EHR system which stores clinical data as collections of XML documents. The records gathered in routine clinical care are a rich source of longitudinal data for use in clinical studies. We describe how the standard language XQuery can be used to identify cohorts of patients, matching specified criteria. We...
As storage-main memory as well as disk-becomes cheaper, the amount of available information is increasing and it is a challenge to organize it. Our broader aim is to provide a unified framework for efficiently versioning and querying data, documents, as well as any kind of semi-structured information between data and documents, which can be stored...
Citations
... Create a mediated schema for integrating approach for XML structures has discussed by Saleem et al. [15], they used linguistic matchers that extract semantics of all node labels and tree-mining data structure and label clusters to find node context. Structure method for enhancing XML clustering without summarize characteristics of XML structure is used by Shalabi and Elfatatry [16], the technique treats with different sizes of homogeneous and heterogeneous XML documents datasets. Al Hamad [17] developed a mediate schema for integrating heterogeneous XML, the technique decomposes the original schema into subschemas using three levels ancestor, root, and leaf. ...
Extensible Markup Language (XML) becomes widely used over the web to exchange and share the data, its operations and tags help to reduce memory, storage and processing of the data; these features and more were the reason behind rapid spread and adoption using of XML model by many companies. The main contribution of this work is to present a literature survey of different conversion techniques and methods between relational and XML databases models, as well as raising the awareness of these techniques and methods. We review the different researches approaches and techniques that developed for XML conversions. These techniques include but not limited to Document Type Definition (DTD), Document Object Model (DOM), clustering and matching, query languages Structured Query Language (SQL), XPath, XQuery, relational storage, relational catalog and other methods.
... The authors in [27] clustered XML documents via PathXP algorithms. PathXP allows to groups documents according to their characteristic features rather than their direct similarity. ...
The main objective of the work is to improve the clustering efficiency and performance when we deal with very big datasets. This paper aims to improve the quality of XML data clustering by exploiting more features extracted from source schemas. In particular, it proposes clustering approach that gathers both content and structure of XML documents to determine similarity between them. The content and structure information are concluded using two different similarity methods that are then grouped via weight factor to compute the overall document similarity. The structural similarity of XML data are derived from edge summaries while content features similarity are derived from aggregate of set of similarity measures; Jaccard, Cosine measure and Jensen-Shannon divergence in one algorithm. However, we also experimented using Jaccard distance as content measure with edge summaries to prove that using an aggregation of content similarity measures can further improve the results. The experiments prove that clustering of XML documents based on structure only information produce worse solution in homogenous environment, while in heterogeneous environment clustering of XML document produce better result when the structure and the content are combined. Results have shown that performance and quality of the proposed approach is better in comparison of both XEdge and XCLSC approaches.