The main objective of the work is to improve the clustering efficiency and performance when we deal with very big datasets. This paper aims to improve the quality of XML data clustering by exploiting more features extracted from source schemas. In particular, it proposes clustering approach that gathers both content and structure of XML documents to determine similarity between them. The content and structure information are concluded using two different similarity methods that are then grouped via weight factor to compute the overall document similarity. The structural similarity of XML data are derived from edge summaries while content features similarity are derived from aggregate of set of similarity measures; Jaccard, Cosine measure and Jensen-Shannon divergence in one algorithm. However, we also experimented using Jaccard distance as content measure with edge summaries to prove that using an aggregation of content similarity measures can further improve the results. The experiments prove that clustering of XML documents based on structure only information produce worse solution in homogenous environment, while in heterogeneous environment clustering of XML document produce better result when the structure and the content are combined. Results have shown that performance and quality of the proposed approach is better in comparison of both XEdge and XCLSC approaches.