Enhanced Proximal Policy Optimization for Complex Game AI: Applying Reinforcement Learning to Super Mario

<p>Lei Wang<sup>1</sup>, Bo Li<sup>2</sup>, Shengyu Wang<sup>3</sup>, Tingting Wang<sup>4</sup></p>

doi:10.25236/AJCIS.2024.071120

Academic Journal of Computing & Information Science, 2024, 7(11); doi: 10.25236/AJCIS.2024.071120.

Enhanced Proximal Policy Optimization for Complex Game AI: Applying Reinforcement Learning to Super Mario

Author(s)

Lei Wang¹, Bo Li², Shengyu Wang³, Tingting Wang⁴

Corresponding Author:

Tingting Wang

Affiliation(s)

¹Department of Continuous Education, Chengdu Neusoft University, Chengdu, China

²Department of Intelligent Science and Engineering, Chengdu Neusoft University, Chengdu, China

³Chengdu Shude High School, Chengdu, China

⁴Department of Elementary Education, Chengdu Neusoft University, Chengdu, China

Download PDF
|
Download: 53
|
View: 4793

Abstract

This paper presents an optimized implementation of Proximal Policy Optimization (PPO) for controlling an AI agent in the Super Mario environment. By introducing enhancements such as adaptive clipping, dual-clip objectives, and experience replay, our model addresses common limitations in standard PPO, such as unstable updates and sample inefficiency. Experimental results demonstrate that the enhanced PPO model achieves a completion rate exceeding 95% across Super Mario levels, utilizing fewer samples and exhibiting more stable convergence than baseline models. This study highlights the effectiveness of PPO in dynamic decision-making scenarios and provides a foundation for future reinforcement learning advancements.

Keywords

PPO, Super Mario, Reinforcement Learning, Game AI, Sample Efficiency

Cite This Paper

Lei Wang, Bo Li, Shengyu Wang, Tingting Wang. Enhanced Proximal Policy Optimization for Complex Game AI: Applying Reinforcement Learning to Super Mario. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 11: 150-154. https://doi.org/10.25236/AJCIS.2024.071120.

References

[1] Sun Y, Yuan X, Liu W, et al. Model-Based Reinforcement Learning via Proximal Policy Optimization [C]// 2019 Chinese Automation Congress (CAC). IEEE, 2019.

[2] I. Varshini Devi, B. Natarajan, S. Prabu, R. A. Praba, K. Ushanandhini and K. S. Guruprakash. Automated Stock Trading using Reinforcement Learning[J]. 2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India, 2023, pp. 1-6.

[3] Piergigli D, Ripamonti L A, Maggiorini D, et al. Deep Reinforcement Learning to train agents in a multiplayer First Person Shooter: some preliminary results[C]// 2019 IEEE Conference on Games (CoG). IEEE, 2019.

[4] Zhang D, Bailey C P. Obstacle avoidance and navigation utilizing reinforcement learning with reward shaping[C]//Artificial intelligence and machine learning for multi-domain operations applications II. SPIE, 2020, 11413: 500-506.

[5] Wang X, Liu X, Shen T, et al. A greedy navigation and subtle obstacle avoidance algorithm for USV using reinforcement learning[C]// 2019 Chinese Automation Congress (CAC). IEEE, 2019.

[6] She J. Combining PPO and Evolutionary Strategies for Better Policy Search[J]. Accessed: Nov. 6th, 2021.

[7] Liang Z, Chen H, Zhu J, et al. Adversarial deep reinforcement learning in portfolio management[J]. arXiv preprint arXiv:1808.09940, 2018.

[8] Kargar E, Kyrki V. MACRPO: multi-agent cooperative recurrent policy optimization[J]. arXiv preprint arXiv:2109.00882, 2021.