TabClusterNet: Enhanced Deep Clustering for Tabular Data Analysis

<p>Shuwei Xu, Zhi Hu, Xiaowei Wang</p>

doi:10.25236/AJCIS.2024.070607

Academic Journal of Computing & Information Science, 2024, 7(6); doi: 10.25236/AJCIS.2024.070607.

TabClusterNet: Enhanced Deep Clustering for Tabular Data Analysis

Author(s)

Shuwei Xu, Zhi Hu, Xiaowei Wang

Corresponding Author:

Shuwei Xu

Affiliation(s)

Software College, Shenyang Normal University, Shenyang, Liaoning, 110034, China

Download PDF
|
Download: 4
|
View: 103

Abstract

Clustering high-dimensional tabular data is a complex and challenging problem. Traditional clustering techniques nearly fail to identify the latent structure buried in spaces of high dimensionality. TabClusterNet is a novel deep clustering model designed specifically for tasks related to tabular data analysis. The self-supervised learning encoder-decoder from TabNet is combined with the deep clustering framework of Deep Embedding Clustering (DEC). By the high feature-extracting power of TabNet and the high clustering ability of DEC, TabClusterNet achieves far superior performance than the conventional method in feature extraction towards efficient clustering. Our proposed novel deep clustering architecture has been extensively validated over various public datasets for its great performance over different evaluation metrics. A closer look at the model shows that it preserves the structure of the data. TabClusterNet has been demonstrated to achieve substantially improved clustering accuracy and not only offer insights useful for data analytics and decision support, but also enable data scientists and researchers to glean deeper insights from complex datasets.

Keywords

Deep Clustering, Tabular Data, Self-supervised Learning

Cite This Paper

Shuwei Xu, Zhi Hu, Xiaowei Wang. TabClusterNet: Enhanced Deep Clustering for Tabular Data Analysis. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 6: 44-52. https://doi.org/10.25236/AJCIS.2024.070607.

References

[1] Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning . New York: springer. (Vol. 4, No. 4, p. 738)

[2] Tian, F., Gao, B., Cui, Q., Chen, E., & Liu, T. Y. (2014, June). Learning deep representations for graph clustering. In Proceedings of the AAAI conference on artificial intelligence (Vol. 28, No. 1, p56).

[3] Peng, X., Xiao, S., Feng, J., Yau, W. Y., & Yi, Z. (2016, July). Deep subspace clustering with sparsity prior. In IJCAI (pp. 1925-1931).

[4] Xie, J., Girshick, R., & Farhadi, A. (2016, June). Unsupervised deep embedding for clustering analysis. In International conference on machine learning (pp. 478-487).

[5] Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84-90.

[6] Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., & Kasneci, G. (2022). Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems.

[7] Abrar, S., & Samad, M. D. (2022). Are Deep Image Embedding Clustering Methods Effective for Heterogeneous Tabular Data. arXiv preprint arXiv:22, 12:14111.

[8] Kingma D P, Welling M .Auto-Encoding Variational Bayes [J].arXiv.org, 2014.DOI:10. 48550/arXiv. 1312.6114.

[9] Arik, S. Ö., & Pfister, T. (2021, May). Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 8, pp. 6679-6687).

[10] Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolutional sequence to sequence learning. In International conference on machine learning (pp. 1243-1252).

[11] Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).

[12] Nigam, K., & Ghani, R. (2000, November). Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management (pp. 86-93).

[13] Guyon, I., Sun-Hosoya, L., Boullé, M., Escalante, H. J., Escalera, S., Liu, Z., ... & Viegas, E. (2019). Analysis of the AutoML challenge series. Automated Machine Learning, 177.

[14] Yuan, G. X., Ho, C. H., & Lin, C. J. (2011, August). An improved glmnet for l1-regularized logistic regression. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 33-41).

[15] Blackard, J. A., & Dean, D. J. (1999). Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and electronics in agriculture, 24(3), 131-151.

[16] Nie, F., Zeng, Z., Tsang, I. W., Xu, D., & Zhang, C. (2011). Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Transactions on Neural Networks, 22(11), 1796-1808.