Welcome to Francis Academic Press

International Journal of New Developments in Engineering and Society, 2023, 7(8); doi: 10.25236/IJNDES.2023.070807.

Method for Cleaning Outliers in Massive Data of Internet of Things Based on Hierarchical Clustering Algorithm

Author(s)

An Hailong

Corresponding Author:
An Hailong
Affiliation(s)

Shanghai Briup Technology Inc., Kunshan, Jiangsu, 215311, China

Abstract

In order to clean the outliers of massive data in the Internet of Things and improve the security of data mining and storage, a method of cleaning the outliers of massive data in the Internet of Things based on hierarchical clustering algorithm is proposed. Levenshtein matching method is used to construct the abnormal feature analysis model of massive data of the Internet of Things, and the method of attribute value correlation analysis is combined to classify the aggregate words of massive data of the Internet of Things, dynamic feature weighting method is used to match the equivalent attribute values of data abnormal values, evidential reasoning framework is used to realize hierarchical clustering of data, and the probability of abnormal value distribution of massive data of the Internet of Things is detected according to the matching probability of related attribute values. Combining the method of levenshtein and attribute value correlation analysis, hierarchical clustering of data is realized based on the method of aggregate word vector, and the outliers of data are cleaned according to the clustering results. The simulation results show that this method can improve the data purity, reduce the interference of abnormal data and reduce the computational complexity, and has better matching effect and wider applicability.

Keywords

Hierarchical clustering; Internet of things; Massive data; Outlier cleaning

Cite This Paper

An Hailong. Method for Cleaning Outliers in Massive Data of Internet of Things Based on Hierarchical Clustering Algorithm. International Journal of New Developments in Engineering and Society (2023) Vol.7, Issue 8: 39-46. https://doi.org/10.25236/IJNDES.2023.070807.

References

[1] Zhang Binru. A deep learning approach for daily tourist flow forecasting with consumer search data [J]. Asia Pacific Journal of Tourism Research, 2020, 25(3): 323-339.

[2] Xuejian ZHAO, Hao LI, Haotian TANG. Recommendation rating prediction algorithm based on user interest concept lattice reduction[J]. Journal of Computer Applications, 2023, 43(11): 3340-3345.

[3] Zhuangzhuang XUE, Peng LI, Weibei FAN, Hongjun ZHANG, Fanshuo MENG. Multiple clustering algorithm based on dynamic weighted tensor distance[J]. Journal of Computer Applications, 2023, 43(11): 3449-3456.

[4] WANG J, WANG X, YU G, et al. Discovering multiple co-clusterings with matrix factorization[J]. IEEE Transactions on Cybernetics, 2021, 51(7): 3576- 3587.

[5] ZHOU Y, YU F R, CHEN J, et al. Cyber-physical-social systems: a state-of-the-art survey, challenges and opportunities[J]. IEEE Communications Surveys and Tutorials, 2020, 22(1): 389- 425.

[6] OU Q Y, ZHU E. Multi-kernel clustering algorithm based on compressed subspace alignment[J]. Computer Engineering and Science, 2021, 43(10): 1730- 1735.

[7] YAN J Z, CHEN H, LI Y. Improved fuzzy C-means clustering validity index[J]. Computer Engineering and Applications, 2020, 56(9): 156- 161.

[8] Xin Yu, Yang Jing, Tang Chuheng, Ge Siqiao. An Overlapping Semantic Community Detection Algorithm Based on Local Semantic Cluster. Journal of Computer Research and Development, 2015, 52(7): 1510-1521.

[9] WU Jiang, TANG Chang-jie , LI Taiyong, CUI Liang. Sentiment analysis on Web financial text based on semantic rules. Journal of Computer Applications, 2014, 34(2): 481-485.

[10] ZHANG Deng-yi, WU Wen-li, OUYANG Chu-fei. Approximating Query with Semantic-Based Measure on RDF Graphs. Chinese Journal of Electronics, 2015, 43(7): 1320-1328.

[11] XU Ying, ZENG Shuiling, WU Wenyuan. Complex Morphological Bidirectional Associative Memory Network and Its Performance Analysis[J]. Information and control, 2015, 44(3): 270-275.

[12] Xuewen LIU, Jikui WANG, Zhengguo YANG, et al. Imbalanced data classification algorithm based on ball cluster partitioning and undersampling with density peak optimization[J]. Journal of Computer Applications, 2022, 42(5): 1455-1463.

[13] LIU Q, ZHAI J W, ZHANG Z Z,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27.

[14] Zhao Yali, Yu Zhengtao, Guo Junjun, etc. Cross-language emotion classification model based on emotional semantic confrontation [J]. Computer Engineering and Science, 2023, 45 (02): 338-345.

[15] Xiangyu LUO, Ke YAN, Yan LU, Tian WANG, Gang XIN. Nonuniform time slicing method based on prediction of community variance [J]. Journal of Computer Applications, 2023, 43(11): 3457-3463.