Xinyu Guo, Yuxi Su
School of Applied Mathematics, Beijing Normal University, Zhuhai, Guangdong, 519087, China
This article analyzes the tourist's comments on the scenic spot using TF-IDF, principal component analysis and logistic regression, and obtains the factors that influence tourists' satisfaction with the scenic spot. First, pre-process the data is needed, and then use the precise mode in jieba word segmentation to segment the text, and calculate the top 20 high-frequency words for each scenic spot and hotel. Then merge the 20 popular words of 50 scenic spots (hotels) that were mined together as a data pool, use the TF-IDF algorithm to calculate the feature the lexical item weight is reduced by the kernel principal component method (KernelPCA) to obtain the weight matrix. After that, the data is processed by classification and regression. In terms of classification processing: combine the scenic spot (hotel) score as the classification result and supervised learning using the naive Bayes algorithm, the support vector product machine algorithm, the B P neural network method and the logistic regression method. In terms of regression processing: model evaluation according to the mean squared error (Mean Squared Error, MSE), and finally the classification processing MSE index is better than regression processing.
jieba, TF-IDF, PCA, SVM, naive Bayes, BP neural network
Xinyu Guo, Yuxi Su. Analysis of Tourists' Satisfaction with Scenic Spots. Academic Journal of Humanities & Social Sciences (2021) Vol. 4, Issue 7: 25-28. https://doi.org/10.25236/AJHSS.2021.040705.
 Sun Jinwen, Xiao Jianguo. Research on keyword learning in text classification based on SVM [J]. Computer Science, 2006(11): 182-184.
 Liu Yang. Research on text classification based on support vector machines [D]. Lanzhou University of Technology, 2007.
 Shi Fenggui. Implementation of Chinese text corpus preprocessing module based on jieba Chinese word segmentation [J]. Computer Knowledge and Technology, 2020, v.16(14): 254-257+263.
 Huo Shandong. Using BP neural network to realize Chinese text classification [J]. Computer Times, 2015, No.281 (11): 58-61.
 Fu Yeqin, Wang Xinjian, Zheng Xiangmin. Research on tourism image based on network text analysis ——Taking Gulangyu as an example [J]. Tourism Forum, 2012, 05(4): 59-66.
 Hu Xiqiang. Evaluation of Tourist Satisfaction in Changsha City Tourism Destinations Based on Internet Comments [D]. Xiangtan University.