Welcome to Francis Academic Press

Academic Journal of Computing & Information Science, 2024, 7(5); doi: 10.25236/AJCIS.2024.070513.

The contrastive learning algorithm based on masked image modeling

Author(s)

Feng Xin1,2, Li Xinwei1,2

Corresponding Author:
Li Xinwei
Affiliation(s)

1School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China

2Henan Key Laboratory of Intelligent Detection and Control of Coal Mine Equipment, Jiaozuo, China

Abstract

Contrastive learning aims to train an encoder in a self-supervised manner to obtain representations of input images. Traditional contrastive learning methods primarily model relationships between image categories. This approach allows the model to learn representations with strong instance discriminability but overlooks representations of local perceptibility in images. Therefore, to enhance the feature representation capability of contrastive learning, a method called Masked Contrastive Learning (MCLim) is proposed, which is based on masked image modeling. MCLim introduces the idea of masked image modeling in contrastive learning, utilizing a dual-branch contrastive learning structure. The first branch processes the image after being masked and fed into the network, enabling the model to simultaneously perform contrastive learning and restore the original input image. This approach aims to learn representations with both instance discriminability and local perceptibility. MCLim employs a method to restrict the size of the masked area to prevent negative impacts on model performance from feature augmentation. It adapts to various encoding networks by changing the size of the masked block. The projection layer in the first branch serves as both a decoder and reconstructs the input image. Experimental classification using MCLim on the Cifar10 dataset yields a Top1 accuracy of 89.23%, outperforming other algorithms in the same category.

Keywords

contrastive learning; mask; feature expression; instance discriminability; local perceptibility

Cite This Paper

Feng Xin, Li Xinwei. The contrastive learning algorithm based on masked image modeling. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 5: 101-108. https://doi.org/10.25236/AJCIS.2024.070513.

References

[1] ZHANG Chong-Sheng, CHEN Jie, LI Qilong,et al. Deep Contrastive Learning: A Survey[J]. Acta Automatica Sinica. 2023,(1):15-39.

[2] Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]//International conference on machine learning. PMLR,2020: 1597-1607.

[3] Chen X, Xie S, He K. An empirical study of training self-supervised vision transformers. In 2021 IEEE[C]//CVF International Conference on Computer Vision (ICCV). 9620-9629.

[4] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[5] Grill J B, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to self-supervised learning[J]. Advances in neural information processing systems, 2020, 33: 21271-21284.

[6] He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 16000-16009.

[7] Xie Z, Zhang Z, Cao Y, et al. Simmim: A simple framework for masked image modeling[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 9653-9663.

[8] Huang Z, Jin X, Lu C, et al. Contrastive masked autoencoders are stronger vision learners[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.