Academic Journal of Computing & Information Science, 2024, 7(11); doi: 10.25236/AJCIS.2024.071113.
Zhuoxi Li
School of Computing, Guangdong Neusoft University, Foshan, China
With the rapid development of artificial intelligence and Internet of Things technology, smart home has become an important part of modern life. The smart home system greatly improves the convenience and comfort of users through voice control, intelligent control and other technologies. However, in practical applications, smart home devices face resource constraints, especially in terms of battery life and computing power. Voice wake-up technology in low resource environment is of great significance to smart home system. Low-power voice wake-up technology can extend the battery life of smart home devices and improve the user experience. Secondly, the voice wake-up technology with low computational complexity can reduce the hardware cost of smart home devices and promote the popularity of smart home systems. Efficient and accurate voice wake-up technology can also improve the interaction efficiency and user satisfaction of smart home systems. Therefore, the study of voice wake-up technology in low resource environment is of great significance to promote the development and application of smart home system. At present, voice awakening technology in low resource environment has become a research hotspot in academia and industry. Domestic and foreign scholars have done a lot of research on feature extraction, model optimization, low power design and so on. In feature extraction, Mel-Frequency Cepstral Coefficients (MFCC) and other algorithms are widely used in speech signal processing. In terms of model optimization, Deep Neural Networks (DNN), Convolutional Neural Networks (CNN) and other models are used to improve the accuracy and robustness of voice arousal. In terms of low power design, scholars have proposed methods such as event-driven wake-up mechanism and low power hardware accelerator to reduce the power consumption of voice wake-up systems. However, there are still some problems with the existing research. The existing feature extraction algorithms and model optimization methods are not effective in high noise environment, which can easily lead to false wake up and missed wake up. Secondly, low power design often requires a trade-off between wake up rate and power consumption, and how to reduce power consumption while ensuring wake up rate is a difficult problem. In addition, most of the existing voice wake-up systems are designed for specific scenarios and lack generality and scalability. In this paper, low resource voice wake-up technology in smart home scene is studied, and a method based on low power feature extraction and neural network optimization is proposed. Aiming at the resource limitation problem of smart home devices, low power and efficient feature extraction algorithms, such as MFCC and its improved algorithm, are studied. The study of neural network model optimization applies to neural network models in low-resource environments, such as deep neural network (DNN), convolutional neural network (CNN), etc., and performs model optimization to improve the accuracy and robustness of voice arousal. Low power design methods such as low power hardware accelerators and event-driven wake up mechanism are studied to reduce the power consumption of voice wake up systems. The research of this paper provides a new idea and method for voice wake-up technology in smart home system, which is of great significance for promoting the development and application of smart home system.
Smart Home; Voice Wake-Up; Low Power Consumption; Neural Network; Resource Constraint
Zhuoxi Li. Research and Implementation of Low Resource Voice Awakening Technology in Smart Home Scene. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 11: 96-101. https://doi.org/10.25236/AJCIS.2024.071113.
[1] Baidu Online Network Technology (Beijing) Co. Ltd. Patent Application Titled "Method And Apparatus For Waking Up Via Speech" Published Online (USPTO 20200328903)[J]. Technology & Business Journal, 2020, 983.
[2] Wireless Communication Companies. Patent Issued for Voice Recognition Function Realizing Method and Device (USPTO 9542935)[J]. Journal of Engineering, 2017, 1858.
[3] Muda L, Begam M, Elamvazuthi I. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques[J]. CoRR, 2010, 4083
[4] Qiuying S, Shiwen D, Jiqing H. Task-driven common subspace learning based semantic feature extraction for acoustic event recognition[J]. Expert Systems With Applications, 2023, 234.
[5] Hoy M B. Alexa, Siri, Cortana, and more: an introduction to voice assistants[J]. Medical reference services quarterly, 2018, 37(1): 81-88.
[6] Chen G, Parada C, Heigold G. Small-footprint keyword spotting using deep neural networks[C]. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, 4087-4091
[7] Stafylakis T, Tzimiropoulos G. Zero-shot keyword spotting for visual speech recognition in-thewild[C]. Proceedings of the European Conference on Computer Vision, 2018, 513-529.
[8] Dudley H, Balashek S. Automatic recognition of phonetic patterns in speech[J]. The Journal of the Acoustical Society of America, 1958, 30(8): 721-732.
[9] Weintraub M. Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system[C]. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993, 2: 463-466
[10] Wang Y, Long Y. Keyword spotting based on CTC and RNN for Mandarin Chinese speech[C]. 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), 2018, 374-378