Academic Journal of Computing & Information Science, 2025, 8(8); doi: 10.25236/AJCIS.2025.080810.
Sibo Song, Zexin Yin, Haoze Liu, Peiyan Han, Siying Song, Zitai Wang
Beijing 21st Century School, Beijing, China
To meet the edge-side application requirements of speech recognition in the human-computer interaction scenario of AI refrigerators, this paper designs a concise deep learning architecture for speech recognition tasks. This architecture consists of 4 convolutional layers, 2 pooling layers, and 2 fully connected layers, featuring a small parameter count, suitability for edge-side computing, and low power consumption. It can effectively address the limited hardware processing capabilities of devices such as AI refrigerators. Verification results on the target dataset show that the speech recognition accuracy of this architecture reaches, which can meet the practical requirements of speech command recognition in the human-computer interaction of AI refrigerators.
AI Refrigerator, Speech Recognition, Lightweight Convolutional Neural Network, Lightweight CNN, Edge Computing
Sibo Song, Zexin Yin, Haoze Liu, Peiyan Han, Siying Song, Zitai Wang. The Development of Simple Speech Recognition Program and Problem Solving. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 8: 67-71. https://doi.org/10.25236/AJCIS.2025.080810.
[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
[2] Dai, X. (2024). Robust deep-learning-based refrigerator food recognition. Frontiers in Artificial Intelligence, 7.
[3] Rokhva, S., Teimourpour, B., & Soltani, A. H. (2024). Computer vision in the food industry: Accurate, real-time, and automatic food recognition with pretrained MobileNetV2. Food and Humanity, 3, 100378.
[4] Banoth, R. K., & Murthy, B. V. R. (2024). Soil Image Classification Using Transfer Learning Approach: MobileNetV2 with CNN. SN Computer Science, 5(1).
[5] Lubura, J., Pezo, L., Sandu, M. A., Voronova, V., Donsì, F., Šic Žlabur, J., ... & Voća, N. (2022). Food recognition and food waste estimation using convolutional neural network. Electronics, 11(22), 3746.
[6] Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33, 12449-12460.
[7] Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July). Robust speech recognition via large-scale weak supervision. In International conference on machine learning (pp. 28492-28518). PMLR.
[8] Graves, A., Jaitly, N., & Mohamed, A.-r. (2013). Hybrid speech recognition with deep bidirectional LSTM. In the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 273–277).
[9] Wired. (2020). Worried About Privacy at Home? There's an AI for that. Wired. https://www. wired. com/story/edge-ai-appliances-privacy-at-home/
[10] Zhou Feiyan, Jin Linpeng, & Dong Jun. (2017). Review of Convolutional Neural Network Research. Journal of Computer Science, 40(6), 1229-1251.