Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2023, 6(6); doi: 10.25236/AJCIS.2023.060622.

English Word Difficulty Classifier Based on Random Forest Model

Author(s)

Miao Peng1, Yujie Wu1, Yannan Qiu2

Corresponding Author:
Yujie Wu
Affiliation(s)

1School of Finance, Guangdong University of Foreign Studies, Guangzhou, China, 510006

2School of Accounting, Guangdong University of Foreign Studies, Guangzhou, China, 510006

Abstract

Recently, Wordle has become popular worldwide as a daily puzzle game launched by the New York Times. Players try to solve the puzzle by guessing a five-letter word in six tries or less. According to Wordle's statistical data, this paper first uses the K-means algorithm to cluster the difficulty of solution words to quantify the difficulty of English words and analyzes the accuracy and scientificity of the clustering results. Then, the paper uses the Random Forest model to classify the difficulty of words into three categories: ‘easy’, ‘normal’ and ‘hard’. The results show that the classification accuracy on the training set and the test set reaches 0.972 and 0.978 respectively.

Keywords

English Word Difficulty Classifier, Random Forest model, K-means algorithm

Cite This Paper

Miao Peng, Yujie Wu, Yannan Qiu. English Word Difficulty Classifier Based on Random Forest Model. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 6: 140-144. https://doi.org/10.25236/AJCIS.2023.060622.

References

[1] Yubo Fu. Research on English Text Difficulty Evaluation Based on Decision Tree [D]. Central China Normal University, 2018: 1-6.

[2] Kang An, Yongbo Zhang, Ze Huang. English text difficulty estimation model based on multiple linear regression [J]. Modern Information Technology, 2022, 11(6): 30-33.

[3] Curto P, Mamede N, Baptista J. Automatic text difficulty classifier [C]. Proceedings of the 7th International Conference on Computer Supported Education. 2015: 36-44.

[4] Hashimoto B J, Egbert J. More than frequency? Exploring predictors of word difficulty for second language learners[J]. Language Learning, 2019, 69(4): 839-872.

[5] Parmar A, Katariya R, Patel V. A review on random forest: An ensemble classifier [C]. International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018. Springer International Publishing, 2019: 758-763.