Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2025, 8(5); doi: 10.25236/AJCIS.2025.080504.

Research on Multi-Turn Chinese NL2SQL Methods Based on Semantic Rewriting

Author(s)

Shuaike Guo, Yongzheng Yang

Corresponding Author:
Shuaike Guo
Affiliation(s)

School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, Guangxi, China

Abstract

In the era of data intelligence, Natural Language to SQL (NL2SQL) technology serves as a core interface for human-machine data interaction, making its performance optimization in multi-turn Chinese dialogue scenarios highly valuable for research. In addressing the issue where reference resolution and semantic omission in the Chinese context can lead to gaps in understanding user intent, this paper proposes a model, RW-T5, integrated with a semantic rewriting mechanism. This model is based on the pre-trained T5 architecture and utilizes hierarchical modeling of dialogue history along with turn-aware encoding to accurately parse semantic unit segmentation and temporal dependencies in multi-turn interactions. It features an innovative design for a global context injection and bidirectional cross-attention fusion module, enabling the capture of both the overall semantic focus and fine-grained word-level semantic details. Utilizing a sequence optimization strategy based on multi-dimensional semantic feature fusion, the model effectively performs explicit resolution of implicit reference relationships and logical completion of omitted semantics in multi-turn dialogues, providing a semantically complete and structurally standardized input for subsequent SQL statement generation. Experimental validation on the large-scale Chinese multi-turn dialogue benchmark dataset, CHASE, shows that this model significantly outperforms other advanced NL2SQL parsing methods, fully validating the effectiveness of the dynamic semantic rewriting mechanism and hierarchical modeling approach, and offering an effective solution for the engineering implementation of intelligent data interaction systems in Chinese multi-turn dialogue scenarios.

Keywords

Natural Language Processing; Multi-Turn Dialogue Understanding; Semantic Rewriting; NL2SQL

Cite This Paper

Shuaike Guo, Yongzheng Yang. Research on Multi-Turn Chinese NL2SQL Methods Based on Semantic Rewriting. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 5: 38-45. https://doi.org/10.25236/AJCIS.2025.080504.

References

[1] Androutsopoulos I, Ritchie G D, Thanisch P. Natural Language Interfaces to Databases - An Introduction[J]. Natural Language Engineering, 1995, 1(1): 29-81.DOI:10.1017/S135132490000005X.

[2] Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]//Icml. 2001, 1(2): 3.

[3] Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of machine learning research, 2020, 21(140): 1-67.

[4] Bromley J, Guyon I, LeCun Y, et al. Signature verification using a "siamese" time delay neural network[J]. Advances in neural information processing systems, 1993, 7(4):737-744.

[5] LiuQ, Chen B, Lou J G, et al. Incomplete Utterance Rewriting as Semantic Segmentation[C]//Conference on Empirical Methods in Natural Language Processing, 2020: 2846-2857. DOI:10.18653/v1/2020.emnlp-main.227.

[6] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer international publishing, 2015: 234-241.

[7] Lin K, Bogin B, Neumann M, et al. Grammar-based Neural Text-to-SQL Generation[J]. 2019. Available from: https://arxiv.org/pdf/1905.13326.

[8] Guo J, Si Z, Wang Y, et al. Chase: A large-scale and pragmatic Chinese dataset for cross-database context-dependent text-to-sql[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021: 2316-2331.

[9] Zhang R, Yu T, Er H Y, et al. Editing-based SQL query generation for cross-domain context-dependent questions[C]//International joint conference on natural language processing; Conference on empirical methods in natural language processing. 2019: 5337-5348.

[10] Cai Y, Wan X. IGSQL: Database schema interaction graph based neural model for context-dependent text-to-SQL generation[J]. 2020. Available from: https://doi.org/10.48550/arXiv. 2011.05744.

[11] Wang B, Shin R, Liu X, et al. Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers[C]//Annual Meeting of the Association for Computational Linguistics. 2020: 7567-7578.