Academic Journal of Computing & Information Science, 2021, 4(5); doi: 10.25236/AJCIS.2021.040510.
Dongyu Lan, Anqi Tian, Yingzhou Wang, Yajie Li
Beijing University of Technology, Beijing, 100124, China
Latent Semantic Indexing (LSI) is a latent structural model, aiming to quickly and accurately analyze a large number of texts through statistical calculation methods, and then extract the potential semantic connections between terms, while highlighting the key meanings in the text and weakening the bad influence of polysemy of the words. LSI can simplify the text vector and reduce the dimensionality, with high recall and retrieval speed. This article uses examples of spam filtering to introduce in detail the theoretical basis of latent semantic indexing, that is, singular value decomposition and the construction of multi-dimensional conceptual spaces. And the important link-weight calculation TF-IDF method uses "Sigmoid function" and "location factor" to optimize, which can further emphasize the importance of different words in the text, and is also more conducive to the construction of latent semantic structure space. Then, the paper briefly introduces two applications: research on job description clustering and construction of patent information classification system using LSI. In the end, we elaborated on the of two latent semantic indexes: retrieval and search, parallel examples: research on job description clustering and construction of patent information classification system, and gave a brief introduction.
information retrieval, singular value decomposition (SVD), latent semantic index (LSI), algorithm improvement, Sigmoid function, location factor
Dongyu Lan, Anqi Tian, Yingzhou Wang, Yajie Li. An overview of the principle, algorithm improvement and application based on the theory of latent semantic indexing. Academic Journal of Computing & Information Science (2021), Vol. 4, Issue 5: 71-75. https://doi.org/10.25236/AJCIS.2021.040510.
 LiYuanYuan, Mayongqiang. Weight calculation method of text feature words based on latent Semantic index [J]. Computer application, 2008, 28(6): 1460-1466.
 Chenhuahui. A "spam" mail Filtering method based on latent Semantic Index [J]. Computer Application Research, 2000, 10: 17-18.
 HuangXinYi, ZhouWeiMin. Research on job description clustering based on latent Semantic index [J]. Network new media technology, 2017, 6(3): 33-37.
 BiChen, Jiduo*, Caidongfeng. Research on Optimization Technology of latent semantic index based on Patent Information [J]. Journal of Shanxi University (NATURAL SCIENCE EDITION), 2014, 37(1): 26-33.