Academic Journal of Computing & Information Science, 2024, 7(11); doi: 10.25236/AJCIS.2024.071120.
Lei Wang1, Bo Li2, Shengyu Wang3, Tingting Wang4
1Department of Continuous Education, Chengdu Neusoft University, Chengdu, China
2Department of Intelligent Science and Engineering, Chengdu Neusoft University, Chengdu, China
3Chengdu Shude High School, Chengdu, China
4Department of Elementary Education, Chengdu Neusoft University, Chengdu, China
This paper presents an optimized implementation of Proximal Policy Optimization (PPO) for controlling an AI agent in the Super Mario environment. By introducing enhancements such as adaptive clipping, dual-clip objectives, and experience replay, our model addresses common limitations in standard PPO, such as unstable updates and sample inefficiency. Experimental results demonstrate that the enhanced PPO model achieves a completion rate exceeding 95% across Super Mario levels, utilizing fewer samples and exhibiting more stable convergence than baseline models. This study highlights the effectiveness of PPO in dynamic decision-making scenarios and provides a foundation for future reinforcement learning advancements.
PPO, Super Mario, Reinforcement Learning, Game AI, Sample Efficiency
Lei Wang, Bo Li, Shengyu Wang, Tingting Wang. Enhanced Proximal Policy Optimization for Complex Game AI: Applying Reinforcement Learning to Super Mario. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 11: 150-154. https://doi.org/10.25236/AJCIS.2024.071120.
[1] Sun Y, Yuan X, Liu W, et al. Model-Based Reinforcement Learning via Proximal Policy Optimization [C]// 2019 Chinese Automation Congress (CAC). IEEE, 2019.
[2] I. Varshini Devi, B. Natarajan, S. Prabu, R. A. Praba, K. Ushanandhini and K. S. Guruprakash. Automated Stock Trading using Reinforcement Learning[J]. 2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India, 2023, pp. 1-6.
[3] Piergigli D, Ripamonti L A, Maggiorini D, et al. Deep Reinforcement Learning to train agents in a multiplayer First Person Shooter: some preliminary results[C]// 2019 IEEE Conference on Games (CoG). IEEE, 2019.
[4] Zhang D, Bailey C P. Obstacle avoidance and navigation utilizing reinforcement learning with reward shaping[C]//Artificial intelligence and machine learning for multi-domain operations applications II. SPIE, 2020, 11413: 500-506.
[5] Wang X, Liu X, Shen T, et al. A greedy navigation and subtle obstacle avoidance algorithm for USV using reinforcement learning[C]// 2019 Chinese Automation Congress (CAC). IEEE, 2019.
[6] She J. Combining PPO and Evolutionary Strategies for Better Policy Search[J]. Accessed: Nov. 6th, 2021.
[7] Liang Z, Chen H, Zhu J, et al. Adversarial deep reinforcement learning in portfolio management[J]. arXiv preprint arXiv:1808.09940, 2018.
[8] Kargar E, Kyrki V. MACRPO: multi-agent cooperative recurrent policy optimization[J]. arXiv preprint arXiv:2109.00882, 2021.