Document clustering analysis based on hybrid cuckoo search and K-means algorithm

No Thumbnail Available

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

The clustering is an interesting technique for unsupervised document organization in the World Wide Web (WWW). The most widely used partitioning clustering algorithm is K-means. However, it has an issue with random initialization, which might lead to local optimum situations. In fact, metaheuristics-based clustering has demonstrated their efficiency to reach a global solution instead of local one. The Cuckoo search (CS) has been widely used for the clustering problem. However, the number of iterations grows dramatically when the dataset is high dimensional like the documents. In this study, the hybridization cuckoo search and K-means algorithms for the document clustering are analyzed. So, three hybrid algorithms are investigated and compared. The performance and the efficiency of the proposed algorithms are evaluated using Reuters 21578 Text Categorization Benchmark Dataset. The obtained results show the capability of the new approaches to generate more compact clustering and enhancing purity and F-measure clustering qualities

Description

Keywords

Cuckoo Search, K-means, Document Clustering, Optimization, Metaheuristic, F-measure, Purity, Vector Space

Citation

Endorsement

Review

Supplemented By

Referenced By