LE-HRNet: A Lightweight and Efficient Human Pose Estimation Network

<p>Cui Lizhi<sup>1,2</sup>, Jin Hongwei<sup>1,2</sup></p>

doi:10.25236/AJCIS.2026.090411

Academic Journal of Computing & Information Science, 2026, 9(4); doi: 10.25236/AJCIS.2026.090411.

LE-HRNet: A Lightweight and Efficient Human Pose Estimation Network

Author(s)

Cui Lizhi^1,2, Jin Hongwei^1,2

Corresponding Author:

Jin Hongwei

Affiliation(s)

¹School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo, China

²Henan Key Laboratory of Intelligent Detection and Control for Coal Mine Equipment, Jiaozuo, China

Download PDF
|
Download: 6
|
View: 803

Abstract

Human pose estimation is one of the core research directions in the field of computer vision and has received extensive attention from both academia and industry in recent years. The High-Resolution Network (HRNet) can retain and fuse high-resolution features throughout the process by virtue of its parallel multi-resolution branch design, achieving high-precision human keypoint localization. However, the dense convolution stacking and complex feature interaction result in a large parameter scale and high computational cost, making it difficult to be deployed in real-time in resource-constrained scenarios such as embedded devices and mobile terminals. To address this issue, a lightweight and efficient model, LE-HRNet, is proposed: it adopts partial channel convolution and pointwise convolution to construct a lightweight residual module (Leanblock), optimizing and replacing the original 3×3 convolution module of HRNet to reduce parameters; it integrates a lightweight convolution block attention module (EMA) to build the LeE-MA-Fusionblock module to compensate for the feature loss caused by lightweight design. Experiments on the COCO 2017 and MPII datasets show that LE-HRNet has only 4.2M parameters and 1.5GFlops of computational cost, achieving an AP of 74.0% on the COCO validation set and a [email protected] of 90.2% on the MPII validation set, achieving a good balance between model complexity and pose estimation accuracy.

Keywords

Human pose estimation; HRNet; Lightweight model; Attention mechanism

Cite This Paper

Cui Lizhi, Jin Hongwei. LE-HRNet: A Lightweight and Efficient Human Pose Estimation Network. Academic Journal of Computing & Information Science (2026), Vol. 9, Issue 4: 86-95. https://doi.org/10.25236/AJCIS.2026.090411.

References

[1] Gao, Z., Chen, J., Liu, Y. et al. A systematic survey on human pose estimation: upstream and downstream tasks, approaches, lightweight models, and prospects. Artif Intell Rev 58, 68 (2025).

[2] Zheng C, Wu W, Chen C, et al. Deep learning-based human pose estimation: A survey[J]. ACM Computing Surveys, 2023, 55(11): 1–35.

[3] Kappan, M.M., Sandoval, E.B., Meijering, E.et al.A survey on deep learning for 2D and 3D human pose estimation.Artif Intell Rev59, 32 (2026).

[4] Yang, Y.; Ramanan, D.Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1385–1392.

[5] Gkioxari, G.; Arbeláez, P.; Bourdev, L.; Malik, J. Articulated pose estimation using discriminative armlet classifiers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3342–3349.

[6] Toshev, A.; Szegedy, C. Deep Pose: Human Pose Estimation via Deep Neural Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660

[7] Chen, Y.; Wang, Z.; Peng, Y.; Zhang, Z.; Yu, G.; Sun, J. Cascaded Pyramid Network for Multi-person Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7103–7112.

[8] Li, W.; Wang, Z.; Yin, B.; Peng, Q.; Du, Y.; Xiao, T.; Yu, G.; Lu, H.; Wei, Y.; Sun, J. Rethinking on multi-stage networks for human pose estimation. arXiv 2019, arXiv:1901.00148.

[9] Cai, Y.; Wang, Z.; Luo, Z.; Yin, B.; Du, A.; Wang, H.; Zhang, X.; Zhou, X.; Zhou, E.; Sun, J. Learning Delicate Local Representations for Multi-person Pose Estimation. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Volume 12348, pp. 455–472.

[10] Kan, Z.; Chen, S.; Li, Z.; He, Z. Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 729–745.

[11] Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose EstimationIn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 5686–5696.

[12] Cheng, B.; Xiao, B.; Wang, J.; Shi, H.; Huang, T.S.; Zhang, L. Higher HRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5385–5394.

[13] Yang, S.; Quan, Z.; Nie, M.; Yang, W. TransPose: Keypoint Localization via Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 11782–11792.

[14] Xu, Y.; Zhang, J.; Zhang, Q.; Tao, D. Vitpose: Simple vision transformer baselines for human pose estimation. Adv. Neural Inf. Process. Syst. 2022.

[15] Wu Z, Zhang J, Zhang L, Liu X, Qiao H. Bi-HRNet: A Road Extraction Framework from Satellite Imagery Based on Node Heatmap and Bidirectional Connectivity. Remote Sensing. 2022; 14(7):1732.

[16] Zhou, Y., Wang, X., Xu, X., et al.: X-hrnet: towards lightweight human pose estimation with spatially unidimensional self-attention. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), 01–06 (2022).

[17] Li, Q., Zhang, Z., Xiao, F., et al.: Dite-hrnet: dynamic lightweight high-resolution network for human pose estimation. In: International Joint Conference on Artificial Intelligence, pp. 1070–1076 (2022)

[18] Yu, C., Xiao, B., Gao, C., et al.: Lite-hrnet: a lightweight high-resolution network. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10435–10445 (2021).

[19] Xiao, B.; Wu, H.; Wei, Y. Simple Baselines for Human Pose Estimation and Tracking. In Computer Vision—ECCV 2018, 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VI; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11210, pp. 472–487.

[20] Li, S.; Xiang, X. Lightweight Human Pose Estimation Using Heatmap-Weighting Loss. arXiv 2022, arXiv:2205.10611.

[21] Xu, L.; Guan, Y.; Jin, S.; Liu, W.; Qian, C.; Luo, P.; Ouyang, W.; Wang, X. ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16067–16076.

[22] Cheng, H.P.; Liang, F.; Li, M.; Cheng, B.; Yan, F.; Li, H.; Chandra, V.; Chen, Y. Scalenas: One-shot learning of scale-aware representations for visual recognition. arXiv 2020, arXiv:2011.14584.

[23] W. Zhang, J. Fang, X. Wang and W. Liu, "EfficientPose: Efficient human pose estimation with neural architecture search," in Computational Visual Media, vol. 7, no. 3, pp. 335-347, Sept. 2021.

[24] Huang, J., Hong, C., Xie, R. et al. A simple and efficient channel MLP on token for human pose estimation. Int. J. Mach. Learn. & Cyber. 16, 3809–3817 (2025).

[25] Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, and Xiaohui Xie. Ppt: Token-pruned pose transformer for monocular and multi-view human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 424–442, 2022.

[26] William J. McNally, Kanav Vats, Alexander Wong, and John McPhee. Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 37–54, 2022.

[27] Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, and Wenming Yang. RTMO: towards high-performance onestage real-time multi-person pose estimation. In Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1491–1500, 2024

[28] Chen, X., Yang, C., Mo, J., et al.: Cspnext: a new efficient token hybrid backbone. Eng. Appl. Artif. Intell. 132, 107886 (2024).

[29] Li, B., Tang, S., Li, W.: Lmformer: lightweight and multi-feature perspective via transformer for human pose estimation. Neurocomputing 594, 127884 (2024).

[30] Li Z., Dong Y., Wu X., et al. SD-HRNet: A lightweight high-resolution network for human pose estimation based on spatial decoupling[J]. Multimedia Systems, 2025, 31(6).

[31] L. Zhanget al.,EdgePose: Real-Time Human Pose Estimation Scheme for Industrial Scenes,in IEEE Access, vol. 12, pp. 156702-156716, 2024.

[32] Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 483–499.

[33] Bulat, A.; Kossaifi, J.; Pantic, G.T.M. Toward fast and accurate human pose estimation via soft-gated skip connections. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, 16–20 November 2020; pp. 8–15.

[34] R.Yang,S.Li,T. Wang, Y. Min, C. Lan.TransPose: keypoint localization via transformer,Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021).

[35] T. Jiang, P. Lu, L. Zhang, N. Ma, R. Han, C. Lyu, Y. Li, K. Chen, Rtmpose, Real-time multi-person pose estimation based on MMPose, Technical Report, arXiv preprint, 2023.

[36] Maji D., Nagori S., Mathew M., Poddar D. YOLO-Pose: Enhancing YOLO for multi person pose estimation using object keypoint similarity loss. arXiv:2204.06806, 2022.

[37] Li Y., Yang S., Liu P., et al. SimCC: A simple coordinate classification perspective for human pose estimation. arXiv:2107.03332, 2021.