The exponential growth of textual documents has caused difficulties in the process of informatioan retrieval, mainly in the model of linear retrieval based on word matching that generally ineffective. The word synonimy of a text has triggered to the resulting of non relevan documents in the retrieval, on the other hand polisemy factor has caused many of relevan document remain unretrieved. The application of document clustering can improve the performance of retrieval process according to the hypothesis that the documents relevant to the same query tends to be in the same cluster. This research studied the application of document clustering to improve the effectiveness of document retrieval by using cluster-based retrieval in the vector space model. In the first step, document collection was clustered using any cluster algorithm and the cluster center was selected to be cluster representative. In the second step, the search process then matched the query to the all cluster representatives and finally the all documents in the cluster that have the highest similarity to the query was selected to present to the user.. The clustering methods used in this study are partitional method (Bisecting K-Mean and Buckshot algorithms) and hierarchical agglomerative method using cluster similarity of UPGMA and Complete Link. The performance of retrieval was measured using F-measure parameter derived from Precision and Recall of retrieva process. The test document collection used are 1000 news text documents with known cluster structure and 3000 news text documents with unknown cluster structure. The results showed that in the test collection which is evaluated in the retrieval process based on cluster-matching has imporved the performance of 12.3% and 9.5% compare to the process of linear retrieval based on word –matching.

Full-text preview

Available from:
This research doesn't cite any other publications.