Academic Journal of Computing & Information Science, 2023, 6(10); doi: 10.25236/AJCIS.2023.061015.

## Analysis of Wordle's Data Based on a Stepwise Regression Iterative Prediction Model

Author(s)

Tong Shi, Yuxuan Zhao

Corresponding Author:
Tong Shi
Affiliation(s)

Department of Applied Statistics, Anhui University, Anhui, Hefei, China

### Abstract

Wordle is currently a popular puzzle game featured daily in the New York Times. Players are required to guess a five-letter word in up to six attempts to solve the puzzle. This paper considers 30 word attributes that affect the percentage. It assigns values to the attributes by means of dummy variables and other methods in order to study the percentage of the number of players who succeed in solving the puzzle at different number of attempts. A stepwise regression model is established to determine the equation of the attributes affecting each percentage. It is found that the number of repeated letters in a word has the greatest impact on the difficulty of guessing the word. Finally, the word EERIE is used as an example for prediction analysis, which is predicted as a difficult puzzle.

### Keywords

Stepwise Regression, Regression Equation, F-test, Dummy Variable

### Cite This Paper

Tong Shi, Yuxuan Zhao. Analysis of Wordle's Data Based on a Stepwise Regression Iterative Prediction Model. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 10: 100-105. https://doi.org/10.25236/AJCIS.2023.061015.

### References

[1] Anderson B J, Meyer J G. Finding the optimal human strategy for Wordle using maximum correct letter probabilities and reinforcement learning [J]. 2022.

[2] Elzamly, Abdelrafe, and B. Hussin. Mitigating Software Maintenance Project Risks with Stepwise Regression Analysis Techniques. Journal of Modern Mathematics Frontier 3. 2(2014):34-44.

[3] Breaux, and J. Harold. A modification of Efroymson's technique for stepwise regression analysis. Communications of the Acm 11. 8(1968):556-558.

[4] Borght, Koen Van Der, et al. Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations. BMC Bioinformatics, 12, 1 (2011-10-03) 12. 1(2011):386.

[5] Wan D, Wang Y, Gu N, et al. A novel approach to extreme rainfall prediction based on data mining[C]//International Conference on Computer Science & Network Technology.IEEE, 2012. DOI:10.1109/ICCSNT.2012.6526285.

[6] Chen, Y. W., B. Q. Qin, and X. Y. Gao. Prediction of Blue-green Algae Bloom Using Stepwise Multiple Regression between Algae & Related Environmental Factors in Meiliang Bay, Lake Taihu. Journal of Lakeence 13. 1(2001):63-71.

[7] Bonthron, Michael. Rank One Approximation as a Strategy for Wordle. arXiv e-prints (2022).

[8] Koh, Kyle, et al. Mani Wordle: Providing Flexible Control over Wordle. IEEE Transactions on Visualization and Computer Graphics 16. 6(2010):1190-1197.

[9] Wu Y, Zhang Q, Hu Y, et al. Novel binary logistic regression model based on feature transformation of XGBoost for type 2 Diabetes Mellitus prediction in healthcare systems[J]. Future generations computer systems: FGCS, 2022:129.