## Application of BP neural network model based on whale algorithm optimization in text word separation

Hangyu Zeng

Hangyu Zeng
School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, 102206, China

Wordle is a New York Times crossword puzzle that has become very popular recently. In this paper, Wordle is a New York Times crossword puzzle that has become very popular recently. In this paper, a SIR infectious disease model was developed to explain the reasons for the variation in the number of daily reported results and to predict the number of future reports. In this paper, a multi-output BP neural network model optimized by a whale optimization algorithm is built to predict the proportion of reports of the term EERIE on 1 March 2023 as (1.67, 3.47, 18.75, 39.20, 11.67, 13.63, 11.61). The uncertainty in the model arises mainly from the inadequate sample size and possible differences between the reported and true values. For the confidence in the predictions, the mean absolute error (MAPE) was used to measure the confidence in the predictions and the model had a confidence in the predictions of approximately 0.812.In this paper, the number of letter repetitions and the usage rate of the letters that make up a word are considered to be the two most important dimensions that affect the difficulty of a word, so this paper uses the K-Means algorithm to classify words according to these two dimensions and obtain five categories. Then, the data with labels were substituted into the decision tree model to obtain the specific classification criteria, and it was found that the main factor affecting the classification results was the rate of letter usage of the constituent words, with an error rate of 3.28% for 10-fold cross-validation. Finally, we used this model to classify the difficulty of the word EERIE, and the result was that EERIE belonged to the most difficult category of words. In this paper, we found four characteristics of the dataset: 1. the proportion of players choosing high difficulty gradually increases; 2. the difficulty of the game is stable over time; 3. the common use rate of words is stable; and 4. the proportion of unused words leads to a larger proportion of failures. To sum up, this paper is based on the requirements given in the title, comprehensive analysis of Wordle software and the relevant content of the New York Times data report, model construction is rigorous and accurate, with strong rational and practical significance.

SIR Infectious Disease Model, Whale Optimization Algorithm, BP neural network, K-means Clus- Tering Algorithm

