Academic Journal of Computing & Information Science, 2021, 4(2); doi: 10.25236/AJCIS.2021.040201.
Baiyan Chen, Kai Zhou
School of Computer & Software, Nanjing University of Information Science & Technology, Jiangsu Nanjing, China
As a new density-based clustering algorithm, clustering by fast search and find of Density Peaks (DP) algorithm regards each density peak as a potential clustering center when dealing with a single cluster with multiple density peaks, therefore it is difficult to determine the correct number of clusters in the data set. To solve this problem, a mixed density peak clustering algorithm namely C-DP was proposed. Firstly, the density peak points were considered as the initial clustering centers and the dataset was divided into sub-clusters. Then, learned from the Clustering Using Representatives algorithm (CURE), the scattered representative points were selected from the sub-clusters, the clusters of the representative point pairs with the smallest distance were merged, and a parameter contraction factor was introduced to control the shape of the clusters. The experimental results show that the C-DP algorithm has better clustering effect than the DP algorithm. The comparison of the F-measure Index shows that the C-DP algorithm improves the accuracy of clustering when datasets contain multiple density peaks in a single cluster.
Density Peak, Hierarchical Clustering, Cluster Merging, Representative Point, Contraction Factor
Baiyan Chen, Kai Zhou. An improved density peaks clustering algorithm based on CURE. Academic Journal of Computing & Information Science (2021), Vol. 4, Issue 2: 1-6. https://doi.org/10.25236/AJCIS.2021.040201.
[1] Zhen, C., Jiang, C. (2019) Overview of Data Mining in the Era of Big Data. International Core Journal of Engineering, 5, 136-139.
[2] Yan, M., Chen, L., Peng, L. (2016) Parallel programing templates for remote sensing image processing on GPU architectures: design and implementation. Computing, 98, 7-33.
[3] Liu, S., Zou, Y. (2020) An Improved Hybrid Clustering Algorithm Based on Particle Swarm Optimization and K-means. IOP Conference Series: Materials Science and Engineering, 750, 152-158.
[4] Zhao, L., Liu, Z., Levy, S.F. (2018) Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics, 34, 739-747.
[5] Jothi, R., Mohanty, S.K., Ojha, A. (2019) DK-means: a deterministic K-means clustering algorithm for gene expression analysis. Pattern Analysis and Applications, 22, 649-667.
[6] Zhang, P., Shen, Q. (2018) Fuzzy c-means based coincidental link filtering in support of inferring social networks from spatiotemporal data streams. Soft Computing, 22, 1-11.
[7] Zou H. (2020) Clustering Algorithm and Its Application in Data Mining. Wireless Personal Communications, 110, 21-30.
[8] Gob, N., Rathinavelu A. (2018) Analyzing cloud based reviews for product ranking using feature based clustering algorithm. Cluster Computing, 22, 6977-6984.
[9] Chen, J., Chen, J., Yang D. (2018) A k-Deviation Density Based Clustering Algorithm. Mathematical Problems in Engineering, 2, 1-16.
[10] Liu, S.F., Meng, D.X., Wang X.Y. (2014) DBSCAN algorithm based on grid cell. Journal of Jilin University, 44, 1135-1139.
[11] Karami, A., Johansson, R. (2014) Choosing DBSCAN Parameters Automatically using Differential Evolution. International Journal of Computer Applications, 91, 1-11.
[12] Rodriguez, A., Laio, A. (2014) Clustering by fast search and find of density peaks. Science, 344, 1492-1496.
[13] Kirtee. Panwar. Alka. (2016) Modified CURE algorithm with enhancement to identify number of clusters. International journal of artificial intelligence and soft computing: IJAISC, 5, 226-240.