A Lightweight Hybrid Architecture for Speech Recognition

<p>Zhenzhou Liu<sup>1</sup>, Chengdong Weng<sup>1</sup>, Muyuan Liu<sup>1</sup>, Haoxing Xu<sup>1</sup>, Situo Xing<sup>1</sup>, Boyu Luan<sup>1</sup></p>

doi:10.25236/AJCIS.2025.080306

Academic Journal of Computing & Information Science, 2025, 8(3); doi: 10.25236/AJCIS.2025.080306.

A Lightweight Hybrid Architecture for Speech Recognition

Author(s)

Zhenzhou Liu¹, Chengdong Weng¹, Muyuan Liu¹, Haoxing Xu¹, Situo Xing¹, Boyu Luan¹

Corresponding Author:

Zhenzhou Liu

Affiliation(s)

¹Beijing 21st Century School, Beijing, China

Download PDF
|
Download: 33
|
View: 2038

Abstract

This study proposes a lightweight hybrid architecture for speech recognition, integrating four convolutional layers with spectral normalization, two adaptive max-pooling layers, and two fully connected layers with dropout regularization. The design emphasizes computational efficiency through kernel pruning while maintaining consistent inference performance across hardware platforms. Evaluation using noisy speech datasets demonstrates robust recognition accuracy and real-time processing capabilities. Deployment validation confirms operational stability in edge computing environments, confirming suitability for resource-constrained applications requiring energy-efficient speech recognition.

Keywords

Speech Recognition, Deep Learning, Convolutional Neural Network, Fully Connected Neural Network, Pooling Layer

Cite This Paper

Zhenzhou Liu, Chengdong Weng, Muyuan Liu, Haoxing Xu, Situo Xing, Boyu Luan. A Lightweight Hybrid Architecture for Speech Recognition. Academic Journal of Computing & Information Science(2025), Vol. 8, Issue 3: 43-50. https://doi.org/10.25236/AJCIS.2025.080306.

References

[1] Ma, H., Tang Z.B., Zhang Y., & Zhang Q.L. (2022). Survey on Speech Recognition. Computer Systems Applications, (01), 1-10.

[2] Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6), 82-97

[3] Wang, J, K., Qin, D, H., Bai, F, B., et al. (2015). A review of research on fusion techniques for speech recognition and large language models. Computer Engineering and Application,1-13.

[4] Liu, Y., Lian, M, M. (2024). Construction methods of speech recognition systems based on 1D convolutional neural network. Audio Engineering, 48(10): 77 – 79

[5] Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016.). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, p. 4960-4964.

[6] Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H. G., & Ogata, T. (2015). Audio-visual speech recognition using deep learning. Applied intelligence, 42, 722-737.

[7] Yu, D., Wei, X., Zhao, X., Zhang, X., & Xu, B. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),IEEE. (pp. 5705-5709). .

[8] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.

[9] Scabini, L. F., & Bruno, O. M. (2023). Structure and performance of fully connected neural networks: Emerging complex network properties. Physica A: Statistical Mechanics and its Applications, 615, 128585.

[10] Boureau, Y. L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 111-118).

[11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.

[12] Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

[13] Li, J., & Gao, G. (2023). Digital construction of geophysical well logging curves using the LSTM deep-learning network.Frontiers in Earth Science, 10, 1041807.

[14] He, Z. (2024). Optimization and Application of Natural Language Processing Models Based on Deep Learning.Journal of Artificial Intelligence Practice, 7(1), 109–115

[15] Li, D., Ortegas, K. D., & White, M. (2023). Exploring the computational effects of advanced deep neural networks on logical and activity learning for enhanced thinking skills.Systems, 11(7), 319.