Welcome to Francis Academic Press

Academic Journal of Agriculture & Life Sciences, 2024, 5(1); doi: 10.25236/AJALS.2024.050124.

Screening of Molecular Descriptors Based on Random Forest Model and Correlation Analysis

Author(s)

Yifan Fu, Zinuo Hao, Peijie Wu

Corresponding Author:
Yifan Fu
Affiliation(s)

Saxo Fintech Business School, University of Sanya, Sanya, 572022, China

Abstract

Breast cancer is the most common and one of the most lethal diseases in the world, the demand for its treatment is imminent. In this study, we explored a method to screen compounds with potential anti-breast cancer potential by targeting estrogen receptor alpha subtype (ERα). Through the analysis of data, 729 kinds of molecular descriptor of 1974 compounds were studied and analyzed. Firstly, strict data preprocessing steps were adopted, including vacancy detection, outlier processing, and feature screening. Finally, 359 valid features were identified. Subsequently, two different dimensionality reduction methods, random forest and entropy method, were used to screen out the 30 main variables with the most significant effect on ERα activity under each method. The comparative analysis shows that the random forest method has advantages in representativeness and effect. Then, by Pearson correlation analysis and gradually eliminate highly correlated variables, eventually determines the effect on activity of ER alpha 20 largest molecular descriptor. These descriptors included SHsOH, XLogP, etc., and their biological activity contribution rates ranged from 0.01045 to 0.00597. This study is helpful for the treatment of breast cancer, and the method of screening compounds with anti-breast cancer potential could be helpful.

Keywords

Breast cancer, estrogen receptor alpha subtype, biological activity

Cite This Paper

Yifan Fu, Zinuo Hao, Peijie Wu. Screening of Molecular Descriptors Based on Random Forest Model and Correlation Analysis. Academic Journal of Agriculture & Life Sciences (2024) Vol. 5 Issue 1: 179-184. https://doi.org/10.25236/AJALS.2024.050124.

References

[1] ShoU Y Q. Optimization modeling of anti-breast cancer candidate drugs [C]// Chinese Society of Toxicology. Proceedings of the 10th National Toxicology Congress of the Chinese Toxicological Society. Qingdao University; 2023:1. DOI: 10.26914 / Arthur c. nkihy. 2023.012614.

[2] Wu S , Yang S , Luo L ,et al.A prediction model of insulation strength for gaseous medium considering the effect of external electric field[J].Journal of Molecular Modeling, 2024, 30(12). DOI:10.1007/s00894-024-06199-2.

[3] Liu R Y. Mathematical modeling of adaptive cancer therapy based on dynamic optimization [D]. Wuhan university, 2022. DOI: 10.27379 /, dc nki. Gwhdu. 2022.000328.

[4] Luo Ding. Molecular modeling of coronavirus major protease inhibitors and BRD4 inhibitors [D]. Shaanxi university of science and technology, 2022. DOI: 10.27290 /, dc nki. GXBQC. 2022.000366.

[5] Mr Chirac. Combination of drugs based on deep learning collaborative prediction research [D]. Shanghai ocean university, 2021. The DOI: 10.27314 /, dc nki. Gsscu. 2021.000805.

[6] LU X D. Theoretical study on CYP4F12 and anticancer drug Plocabulin based on DFT and MD simulation [D]. Tianjin medical university, 2021. DOI: 10.27366 /, dc nki. Gtyku. 2021.000305.

[7] YU Z. AEBoost: a method for predicting the sensitivity of anticancer drugs [D]. Shanghai normal university, 2021. DOI: 10.27312 /, dc nki. Gshsu. 2021.002335.

[8] Feng Yi. Ace inhibitors antitumor agents of QSAR study [D]. Shaanxi university of science and technology, 2021. The DOI: 10.27290 /, dc nki. GXBQC. 2021.000219.

[9] GAO H J. Exploring the synergistic effect of combined drugs by multi-scale drug model [D]. Southwest University, 2018. (In Chinese with English abstract)

[10] Huang, X. Research on computer-aided anticancer drug design and protein homology modeling [D]. Lanzhou University, 2011.