Academic Journal of Computing & Information Science, 2021, 4(8); doi: 10.25236/AJCIS.2021.040812.
College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang, 150001, China
Missing data is widely existing in life. Processing missing data is essential in classification. Therefore, it is a common and essential method to use the existing reliable data set to impute the missing data. These methods have a significant effect on the processing of ambiguity and uncertainty in the data set. At the same time, the processing of missing data sets widely exists in the fields of noise processing and enhancing system robustness. Therefore, using accurate data and imputation methods to impute missing data sets is essential and effective. In this paper, a new method for classification with missing data is proposed. First of all, the training data set is optimized so that classifiers can get trained well. Then it will be used to estimate missing values with the proposed method. By comparing the Precision, Recall, F1, and ARI indicators of the classifier in the classification test with different testing data sets by four different imputation methods, the final result shows that the proposed method performs best on the whole.
Missing data, Imputation, Machine Learning
Haojian Huang. An efficient method to classification with missing data. Academic Journal of Computing & Information Science (2021), Vol. 4, Issue 8: 63-66. https://doi.org/10.25236/AJCIS.2021.040812.
 King-Sun Fu and Rosenfeld. Pattern recognition and image processing. IEEE Transactions on Computers, C-25(12):1336–1346, 1976.
 M. Nakai, D. G. Chen, K. Nishimura, and Y. Miyamoto. Comparative study of four methods in missing value imputations under missing completely at random mechanism. Open Journal of Statistics, 04(1):27–37, 2014.
 Krishnan Bhaskaran and Liam Smeeth. What is the difference between missing completely at random and missing at random? International Journal of Epidemiology, 43(4):1336–1339, 2014.
 Y. Kano and K. Takai. Analysis of nmar missing data without specifying missing-data mechanisms in a linear latent variate model - sciencedirect. Journal of Multivariate Analysis, 102(9):1241–1255, 2011.
 P Jönsson and C. Wohlin. An evaluation of k-nearest neighbour imputation using likert data. Proceedings of International Symposium on Software Metrics, pages 108–118, 2004.
 U. Shrestha, A. Alsadoon, Pwc Prasad, S. A. Aloussi, and O. H. Alsadoon. Supervised machine learning for early predicting the sepsis patient: modified mean imputation and modified chi-square feature selection. Multimedia Tools and Applications, 80(11):1–24, 2021.
 Dongdong Zhao A, Xiaoyi Hu A, Shengwu Xiong A, Jing Tian A, Jianwen Xiang A, Jing Zhou B, and Huanhuan Li C. K-means clustering and knn classification based on negative databases. Applied Soft Computing, 110(1):107732, 2021.
 Z. Ma, Z. Liu, Y. Zhang, L. Song, and J. He. Credal transfer learning with multi-estimation for missing data. IEEE Access, pp(8):70316–70328, 2020.
 A. Mathur and G. M. Foody. Multiclass and binary svm classification: Implications for training and classification users. IEEE Geoscience and Remote Sensing Letters, 5(2):241–245, 2008.
 Cao Feilong and Zhang Yongquan. Neural network interpolation and approximation in distance space. Acta Mathematica, 51(001):91–98, 2008.
 Gustavo E. A. P. A. Batista and Maria Carolina Monard. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5-6):519–533, 2003.
 Gustavo Batista and Maria-Carolina Monard. A study of k-nearest neighbour as an imputation method.volume 30, pages 251–260, 01 2002.