Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2022, 5(8); doi: 10.25236/AJCIS.2022.050802.

A Study of Music Genre Classification with Bilinear Convolutional Neural Network


Zhangyong Xu, Yutong Guo, Shirong Dong, Qibei Xue

Corresponding Author:
Zhangyong Xu

School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China


In this paper, we propose a new bilinear neural network (BCNN) and apply it to music genre classification. We use Resnet and Densenet which are pre-trained to construct the proposed model . At the end of the two networks, we fuse their output feature and send it to the classifier. Our experiments are carried out on the GTZAN dataset. We extract the mel spectrogram of all the audio in the dataset as the input feature and use the fully connected layer as the classifier to study whether the output features of the two classical CNNs are complementary and can achieve better results in this task. Under the same experimental environment and hyperparameters, our proposed BCNN model achieves better classification results than Resnet and Densenet and its classification accuracy is about 81.17%.


Bilinear Neural Network; Music Genre Classification; Mel Spectrum

Cite This Paper

Zhangyong Xu, Yutong Guo, Shirong Dong, Qibei Xue. A Study of Music Genre Classification with Bilinear Convolutional Neural Network. Academic Journal of Computing & Information Science (2022), Vol. 5, Issue 8: 12-17. https://doi.org/10.25236/AJCIS.2022.050802.


[1] Yann LeCun, Yoshua Bengio, and Geoffery Hintion. (2015) Deep learning. Nature, 521:436-444. 

[2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. (2009) ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Comp Soc, 248-255. 

[3] M. Dong. (2018) Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification. Psychology, Rutgers University. 

[4] Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun. (2017) Convolutional recurrent neural networks for music classification. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE. 

[5] W. Zhang, W. Lei, X. Xu, and X. Xing. (2017) Improved music genre classification with convolutional neural networks. 17th Annual Conference of the International-Speech-Communication-Association. The International-Speech-Communication-Association. 

[6] Y. Tokozume, and T. Harada. (2017) Learning environmental sounds with end-to-end convolutional neural network. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2721-2725. 

[7] J. Lee, J. Park, K. L. Kim, and J. Nam. (2017) Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. 2017 Sound and Music Computing Conference. 

[8] X. Li, V. Chebiyyam, and K. Kirchhoff. (2019) Multi-stream Network With Temporal Attention for Environmental Sound Classification. Amazon AI.

[9] Kamalesh Palanisamy, Dipika Singhania, and Angela Yao. (2020) Rethinking CNN Models for Audio Classification. Department of Instrumentation and Control Engineering, National Institute of Technology, Tiruchirappalli, India. 

[10] Zhang. WX, Ma. KD, Yan. J, Deng. DX, and Wang. Z. (2018) Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology, 36-47. 

[11] Lin Wu, Yang Wang, Xue Li, and Junbin Gao. (2019) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Transactions on Cybernetics, 1791-1802. 

[12] Dillon Pulliam, and Hashim Saeed. (2021) Fine-Grained Car Make and Model Classification with Transfer Learning and BCNNs. Department of Electrical and Computer Engineering Carnegie Mellon University.

[13] M. Huzaifah. (2017) Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks. 

[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. (2016) Deep residual learning for image recognition. IEEE, 770-778. 

[15] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. (2017) Densely connected convolutional networks. IEEE, 4700-4708. 

[16] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica. (2018) Tune: a research platform for distributed model selection and training. Presented at the 2018 ICML AutoML workshop. ICML AutoML. 

[17] K. Simonyan, and A. Zisserman. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. Visual Geometry Group, Department of Engineering Science, University of Oxford.