The Development of Simple Speech Recognition Program and Problem Solving

<p>Sibo Song, Zexin Yin, Haoze Liu, Peiyan Han, Siying Song, Zitai Wang</p>

doi:10.25236/AJCIS.2025.080810

Academic Journal of Computing & Information Science, 2025, 8(8); doi: 10.25236/AJCIS.2025.080810.

The Development of Simple Speech Recognition Program and Problem Solving

Author(s)

Sibo Song, Zexin Yin, Haoze Liu, Peiyan Han, Siying Song, Zitai Wang

Corresponding Author:

Sibo Song

Affiliation(s)

Beijing 21^st Century School, Beijing, China

Download PDF
|
Download: 12
|
View: 1652

Abstract

To meet the edge-side application requirements of speech recognition in the human-computer interaction scenario of AI refrigerators, this paper designs a concise deep learning architecture for speech recognition tasks. This architecture consists of 4 convolutional layers, 2 pooling layers, and 2 fully connected layers, featuring a small parameter count, suitability for edge-side computing, and low power consumption. It can effectively address the limited hardware processing capabilities of devices such as AI refrigerators. Verification results on the target dataset show that the speech recognition accuracy of this architecture reaches, which can meet the practical requirements of speech command recognition in the human-computer interaction of AI refrigerators.

Keywords

AI Refrigerator, Speech Recognition, Lightweight Convolutional Neural Network, Lightweight CNN, Edge Computing

Cite This Paper

Sibo Song, Zexin Yin, Haoze Liu, Peiyan Han, Siying Song, Zitai Wang. The Development of Simple Speech Recognition Program and Problem Solving. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 8: 67-71. https://doi.org/10.25236/AJCIS.2025.080810.

References

[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

[2] Dai, X. (2024). Robust deep-learning-based refrigerator food recognition. Frontiers in Artificial Intelligence, 7.

[3] Rokhva, S., Teimourpour, B., & Soltani, A. H. (2024). Computer vision in the food industry: Accurate, real-time, and automatic food recognition with pretrained MobileNetV2. Food and Humanity, 3, 100378.

[4] Banoth, R. K., & Murthy, B. V. R. (2024). Soil Image Classification Using Transfer Learning Approach: MobileNetV2 with CNN. SN Computer Science, 5(1).

[5] Lubura, J., Pezo, L., Sandu, M. A., Voronova, V., Donsì, F., Šic Žlabur, J., ... & Voća, N. (2022). Food recognition and food waste estimation using convolutional neural network. Electronics, 11(22), 3746.

[6] Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33, 12449-12460.

[7] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July). Robust speech recognition via large-scale weak supervision. In International conference on machine learning (pp. 28492-28518). PMLR.

[8] Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep bidirectional LSTM. In the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 273–277).

[9] Wired. (2020). Worried About Privacy at Home? There's an AI for that. Wired. https://www. wired. com/story/edge-ai-appliances-privacy-at-home/

[10] Zhou Feiyan, Jin Linpeng, & Dong Jun. (2017). Review of Convolutional Neural Network Research. Journal of Computer Science, 40(6), 1229-1251.