Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2023, 6(2); doi: 10.25236/AJCIS.2023.060204.

Research on dialect speech recognition based on DenseNet-CTC


Yijie You, Xiangguo Sun

Corresponding Author:
Yijie You

Mechanical Engineering College, Sichuan University of Science and Engineering, Yibin, China


At present, most of the speech recognition research is based on a wide range of regions, and there are few studies on speech recognition of urban dialects. The mainstream speech recognition methods are mostly based on ResNet network, using ResNet network as acoustic model and N-gram as language model. In this study, DenseNet is used as the basic network, and the data set of Zigong dialect subdivided by Sichuan dialect is taken as the research object of speech recognition. DenseNet-BiGRU + CTC is constructed as the acoustic model of speech recognition, and RNN is used as the speech recognition model of language model. Experiments show that the speech recognition model using DenseNet network as the basic network has higher accuracy than the model based on ResNet. Compared with the GRU-CTC network word error rate (WER) decreased by 3 %, compared with the DPCNN-Attention-CTC speech recognition method error rate decreased by 5 %.


Dialect, Speech Recognition, DenseNet, CTC, Acoustic Model

Cite This Paper

Yijie You, Xiangguo Sun. Research on dialect speech recognition based on DenseNet-CTC. Academic Journal of Computing & Information Science (2023), Vol. 6, Issue 2: 23-27. https://doi.org/10.25236/AJCIS.2023.060204.


[1] Hartmann W, Hsiao R, Tsakalidis S. Alternative networks for monolingual bottl e-neck features[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017: 5290-5294.

[2] Mukhedimham Yiminjiang, Aiskar Aimudura, Mijiti Abrimiti. Uyghur speech recognition based on CNN-HMM and RNN [J]. Modern electronic technology, 2021,44 (11):172-176. DOI:10.16652/j.issn.10


[3] Krishna G, Tran C, Carnahan M, et al. Advancing speech recognition with no speech or with noisy speech[C]//2019 27th European Signal Processing Conference (E U-SIPCO). 2019: 1-5.

[4] Kang J, Zhang W Q, Liu J. Gated convolutional networks based hybrid acoustic models for low resource speech recognition[C]//2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2017: 157-164.

[5] Nancuoji, Zhuoma, Dugecao. Tibetan speech recognition based on BLSTM and CTC [J]. Journal of Qinghai Normal University (Natural Science Edition), 2019,35 (04):26-33.DOI:10.16229/j.cnki.issn1001-7542. 2019. 04.005.

[6] Sadhu S, Li R, Hermansky H. M-vectors: Sub-band Based Energy Modulation Fea-tures for Multi-stream Automatic Speech Recognition[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019: 6545-6549.

[7] Yuliani A R, Sustika R, Yuwana R S, et al. Feature transformations for robust speech recognition in reverberant conditions[C]//2017 International Conference on Co m-puter, Control, Informatics and its Applications (IC3INA). 2017: 57-62.

[8] Pan Yuecheng, Liu Zhuo, Pan Wenhao, Cai Dianlun, Wei Zhengsong. An end-to-end Mandarin speech recognition method based on CNN/CTC [J]. Modern information technology,2020,4(05):65-68.DOI:10.19850/j.cnki. 2096-4706.2020.05.019.

[9] Yang Deju, Ma Liangli, Tan Linshan, Pei Jingjing. End-to-end speech recognition based on gated convolutional network and CTC [J]. Computer engineering and design, 2020, 41(09): 2650-2654. DOI:10.16208/j.issn1000-7024.2020.09.037.

[10] Dong Jiaren, Liu Guangcong. Research on speech recognition method based on GRU-CTC hybrid model [J].Modern computer. 2019(26):13-16.

[11] Hu Li, Huang Hongquan, Liang Chao, Song Yueyang, Chen Yanming. End-to-end speech recognition based on dual-channel CNN [J]. Sensors and microsystemms.2021, 40(11):69-72+83.DOI:10.13873/J.1000-9787 (2021)11-0069-04.