Academic Journal of Computing & Information Science, 2023, 6(2); doi: 10.25236/AJCIS.2023.060202.
School of Information Engineering, Nanjing University of Finance & Economics, Nanjing, China
As an important branch of bioinformatics, gene microarray data analysis has become one of the important frontier fields in life sciences. Because of the high cost of microarray experiment, gene expression profile data shows the characteristics of small samples size, high dimensionality and category imbalance between samples. In this case, the traditional feature selection method is difficult to obtain good results. this paper proposes unbiased SVM-RFE (NobSVM-RFE). Compared with traditional feature selection algorithm, NobSVM-RFE algorithm can obtain better feature subset and reduce computation cost. Combining GASMOTE with NobSVM-RFE, this paper proposes a three-stage algorithm GA-NobRFE-SVM, which includes balancing algorithm, feature selection algorithm and classifier. The experimental results show that GA-NobRFE-SVM can effectively improve the classification performance of unbalanced gene data.
Unbalanced gene data, Balancing algorithm, GASMOTE, Feature selection, NobSVM-RFE
Botao Hu. GA-NobRFE-SVM: A New Algorithm for Classification of Unbalanced Gene Data. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 2: 10-15. https://doi.org/10.25236/AJCIS.2023.060202.
 Bhadra T, Mallik S, Hasan N, et al. Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer[J]. BMC bioinformatics, 2022, 23(3): 1-19.
 Bommert A, Welchowski T, Schmid M, et al. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data[J]. Briefings in Bioinformatics, 2022, 23(1):1-13.
 Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16(1): 321-357.
 Jiang K, Lu J, Xia K. A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE[J]. Arabian journal for science and engineering, 2016, 41(8): 3255-3266.
 Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines[J]. Machine learning, 2002, 46(1): 389-422.
 Mishra S, Mishra D. SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm[J]. Karbala International Journal of Modern Science, 2015, 1(2): 86-96.
 Poggio T, Mukherjee S, Rifkin R, et al. Technical Report AI Memo. No. 2001-011[J]. Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA, 2001.
 Xiaojian Ding, Yinliang Zhao. Influence of bias b on generalization performance of support vector machine classification problems [J]. Journal of Automation, 2011, 37(9): 1105-1113.