HRTF low-dimensional representation based on deep convolutional autoencoder and attention mechanism

<p>Hongxu Zhang<sup>1</sup>, Wei Chen<sup>2</sup></p>

doi:10.25236/AJCIS.2024.070201

Academic Journal of Computing & Information Science, 2024, 7(2); doi: 10.25236/AJCIS.2024.070201.

HRTF low-dimensional representation based on deep convolutional autoencoder and attention mechanism

Author(s)

Hongxu Zhang¹, Wei Chen²

Corresponding Author:

Wei Chen

Affiliation(s)

¹School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China

²School of Software, Henan Polytechnic University, Jiaozuo, China

Download PDF
|
Download: 14
|
View: 362

Abstract

Head-Related Transfer Function (HRTF) depicts the reflection and scattering effects of the environment and the human body on sound during the transmission of sound signals from the sound source to the human ear and contains a large amount of auditory cue information for auditory localization. Due to the high-dimensional complexity and nonlinear nature of the sample data of HRTF itself, it creates difficulties in analyzing the relationship between the auditory localization cues of HRTF and the spatial orientation and morphological features of the human body. The traditional low-dimensional representation makes it difficult to effectively deal with the complex nonlinear relationships between multiple auditory cues in HRTF, resulting in performance degradation. To solve this problem, this study proposes a low-dimensional representation method for HRTF based on a deep convolutional autoencoder. The method considers that HRTF spectral features have the property of continuous variation in three-dimensional space and integrates the nonlinear relationships of full-space HRTF features by modeling the natural spatial attributes of the HRTF ensemble data. Firstly, the attention mechanism is introduced in the encoder, which solves the bias caused by mapping HRTF to a 3D tensor for convolution operation and mines the intrinsic features implied between the spatial orientations of HRTF neighborhoods and neighboring spectra, which improves the low-dimensional representation ability of the network. Secondly, the combination of dense connectivity and attention mechanism in the decoder according to the characteristics of different levels guarantees the effective delivery of low-dimensional features. Experimental results on several publicly available HRTF datasets show that the proposed model outperforms traditional methods in the low-dimensional representation and reconstruction of HRTFs and realizes high-performance low-dimensional representation and reconstruction of HRTFs.

Keywords

Head-related Transfer Functions, Convolutional Auto Encoder, Attention Mechanism, Spatial Audio

Cite This Paper

Hongxu Zhang, Wei Chen. HRTF low-dimensional representation based on deep convolutional autoencoder and attention mechanism. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 2: 1-11. https://doi.org/10.25236/AJCIS.2024.070201.

References

[1] TECHNOLOGIES O. Facemisc. 3d audio spatialization. [EB]. 2020. https://developer.oculus. com/resources/audio-intro-spatialization/

[2] STEAM A. A benchmark in immersive audio solutions for games and vr. [EB]. 2020. https://valvesoftware.github.io/steam-audio/

[3] MICROSOFT. Spatial sound. [EB]. 2020. https://learn.microsoft.com/en-us/windows/win32/ coreaudio/ spatial-sound

[4] HOGG A O, JENKINS M, LIU H, et al. HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection [J]. arXiv preprint arXiv: 230605812, 2023.

[5] Chen T Y, Kuo T H, Chi T S. Autoencoding HRTFs for DNN based HRTF personalization using anthropometric features[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 271-275.

[6] Hu H, Zhou L, Ma H, et al. HRTF personalization based on artificial neural network in individual virtual auditory space [J]. Applied Acoustics, 2008, 69(2): 163-172.

[7] Meng L, Wang X, Chen W, et al. Individualization of head related transfer functions based on radial basis function neural network[C]//2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2018: 1-6.

[8] Chun C J, Moon J M, Lee G W, et al. Deep neural network based HRTF personalization using anthropometric measurements[C]//Audio Engineering Society Convention 143. Audio Engineering Society, 2017.

[9] Schmidhuber J. Deep learning in neural networks: An overview[J]. Neural networks, 2015, 61: 85-117.

[10] Jordan M I, Mitchell T M. Machine learning: Trends, perspectives, and prospects[J]. Science, 2015, 349(6245): 255-260.

[11] Huang H, Lin L, Tong R, et al. Unet 3+: A full-scale connected unet for medical image segmentation[C]//ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2020: 1055-1059.

[12] Iandola F, Moskewicz M, Karayev S, et al. Densenet: Implementing efficient convnet descriptor pyramids [J]. arXiv preprint arXiv:14041869, 2014.

[13] Majdak P, Noisternig M, Wierstorf H, et al. SOFA (Spatially Oriented Format for Acoustics) [EB]. Obtenido de https://www. sofaconventions. org. 2017

[14] Austrian Academy of Sciences. Ari HRTF database[EB]. [2020-10-11]. http://www. kfs.oeaw. ac.at/hrtf.

[15] Algazi V R, Duda R O, Thompson D M, et al. The cipic hrtf database[C]//Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575). IEEE, 2001: 99-102.

[16] Brinkmann F, Dinakaran M, Pelzer R, et al. The hutubs hrtf database[J]. DOI, 2019, 10: 14279.

[17] Brinkmann F, Dinakaran M, Pelzer R, et al. A cross-evaluated database of measured and simulated HRTFs including 3D head meshes, anthropometric features, and headphone impulse responses[J]. Journal of the Audio Engineering Society, 2019, 67(9): 705-718.

[18] Rugeles Ospina F, Emerit M, Katz B F G. The three-dimensional morphological database for spatial hearing research of the BiLi project[C]//Proceedings of Meetings on Acoustics. AIP Publishing, 2015, 23(1).

[19] Zhang M, Qiao Y, Wu X, et al. Distance-dependent modeling of head-related transfer functions[C]//ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019: 276-280.

[20] Lu D, Zeng X, Guo X, et al. Personalization of head-related transfer function based on sparse principle component analysis and sparse representation of 3D anthropometric parameters[J]. Acoustics Australia, 2020, 48: 49-58.