Multi-Layer Distillation and Prototype Replay for Class-Incremental Learning

<p>Ce Zhao<sup>1</sup>, Yiwen Zhang<sup>1</sup>, Lifeng Yao<sup>1</sup>, Jingya Wang<sup>1</sup>, Yonghao Chen<sup>1</sup>, Naifu Ye<sup>2</sup></p>

doi:10.25236/AJCIS.2025.080208

Academic Journal of Computing & Information Science, 2025, 8(2); doi: 10.25236/AJCIS.2025.080208.

Multi-Layer Distillation and Prototype Replay for Class-Incremental Learning

Author(s)

Ce Zhao¹, Yiwen Zhang¹, Lifeng Yao¹, Jingya Wang¹, Yonghao Chen¹, Naifu Ye²

Corresponding Author:

Jingya Wang

Affiliation(s)

¹College of Information and Cyber Security, People’s Public Security University of China, Beijing, China

²School of Policing Information, Shandong Police College, Jinan, China

Download PDF
|
Download: 37
|
View: 2080

Abstract

The primary challenge in class-incremental learning is mitigating catastrophic forgetting, where a model’s performance on previously learned tasks deteriorates drastically as it adapts to new tasks. This problem becomes even more pronounced in real-world applications, where the new data is constantly emerging, and retraining from scratch is not feasible. In this paper, we propose a class-incremental learning method called MDPR, which integrates several techniques, including self-supervised data augmentation, multi-layer knowledge distillation, and enhanced prototype replay. These strategies collectively improve the feature diversity extracted by the model, preserve knowledge of previous tasks, and address the issue of class imbalance in the classifier. Experimental results on the CIFAR-100 benchmarks demonstrate that MDPR effectively mitigates catastrophic forgetting, achieving high performance in class-incremental learning tasks.

Keywords

Class-Incremental Learning, Catastrophic Forgetting, Lifelong learning, Image Classification

Cite This Paper

Ce Zhao, Yiwen Zhang, Lifeng Yao, Jingya Wang, Yonghao Chen, Naifu Ye. Multi-Layer Distillation and Prototype Replay for Class-Incremental Learning. Academic Journal of Computing & Information Science (2025), Vol. 8, Issue 2: 56-63. https://doi.org/10.25236/AJCIS.2025.080208.

References

[1] Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural networks, 113, 54-71.

[2] Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12), 2935-2947.

[3] Rusu, A., Rabinowitz, N., Desjardins, G., et al. (2016). Progressive neural networks. arXiv. https://doi.org/10.48550/arXiv.1606.04671.

[4]Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., ... & Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13), 3521-3526.

[5] Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2001). Incremental classifier and representation learning. In Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5533-5542).

[6] Zenke F, Poole B, Ganguli S. Continual Learning through Synaptic Intelligence. Proceedings of the 34th International Conference on Machine Learning. 2017;70:3987-3995.

[7] Van de Ven, G. M., & Tolias, A. S. (2019). Three scenarios for continual learning. arXiv preprint arXiv:1904.07734.

[8] Lopez-Paz, D., & Ranzato, M. A. (2017). Gradient episodic memory for continual learning. Advances in neural information processing systems, 30.

[9] Farquhar, S., & Gal, Y. (2018). Towards robust evaluations of continual learning. arXiv preprint arXiv:1805.09733.

[10] Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., & Wayne, G. (2019). Experience replay for continual learning. Advances in neural information processing systems, 32.

[11] Buzzega, P., Boschini, M., Porrello, A., Abati, D., & Calderara, S. (2020). Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33, 15920-15930.

[12] Bang, J., Kim, H., Yoo, Y., Ha, J. W., & Choi, J. (2021). Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8218-8227).

[13] Shin, H., Lee, J. K., Kim, J., & Kim, J. (2017). Continual learning with deep generative replay. Advances in neural information processing systems, 30.

[14] Chaudhry, A., Ranzato, M. A., Rohrbach, M., & Elhoseiny, M. (2018). Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420.

[15] Aljundi, R., Lin, M., Goujaud, B., & Bengio, Y. (2019). Online continual learning with no task boundaries. arXiv preprint arXiv:1903.08671, 3, 2.

[16] Kang, M., Park, J., & Han, B. (2022). Class-incremental learning by knowledge distillation with adaptive feature consolidation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16071-16080).

[17] Wang, F. Y., Zhou, D. W., Ye, H. J., & Zhan, D. C. (2022, October). Foster: Feature boosting and compression for class-incremental learning. In European conference on computer vision (pp. 398-414). Cham: Springer Nature Switzerland.

[18]Mallya A, Lazebnik S. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2018 Jun; Salt Lake City, UT. IEEE; 2018:7765–7773.

[19] Mallya, A., Davis, D., & Lazebnik, S. (2018). Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision (ECCV) (pp. 67-82).

[20] Van de Ven, G. M., Tuytelaars, T., & Tolias, A. S. (2022). Three types of incremental learning. Nature Machine Intelligence, 4(12), 1185-1197.

[21] Kantipudi, J., Dubey, S. R., & Chakraborty, S. (2020). Color channel perturbation attacks for fooling convolutional neural networks and a defense against such attacks. IEEE Transactions on Artificial Intelligence, 1(2), 181-191.

[22] Lee, H., Hwang, S. J., & Shin, J. (2020, November). Self-supervised label augmentation via input transformations. In International conference on machine learning (pp. 5714-5724). PMLR.

[23]Wu, Y., Chen, Y., Wang, L., Ye, Y., Liu, Z., Guo, Y., & Fu, Y. (2019). Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 374-382).

[24] Zhu, F., Zhang, X. Y., Wang, C., Yin, F., & Liu, C. L. (2021). Prototype augmentation and self-supervision for incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5871-5880).

[25] Yu L, Twardowski B, Liu X, et al. Semantic Drift Compensation for Class-Incremental Learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020:10016-10025. doi:10.1109/CVPR42600.2020.01002.

[26] Goswami, D., Soutif-Cormerais, A., Liu, Y., Kamath, S., Twardowski, B., & van de Weijer, J. (2024). Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 28525-28534).

[27] Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[28 Dhar, P., Singh, R. V., Peng, K. C., Wu, Z., & Chellappa, R. (2019). Learning without memorizing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5138-5146).

[29] Liu, Y., Parisot, S., Slabaugh, G., Jia, X., Leonardis, A., & Tuytelaars, T. (2020). More classifiers, less forgetting: A generic multi-classifier paradigm for incremental learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16 (pp. 699-716). Springer International Publishing.

[30] Zhu, K., Zhai, W., Cao, Y., Luo, J., & Zha, Z. J. (2022). Self-sustaining representation expansion for non-exemplar class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9296-9305).