Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

This research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization (PPO), in the context of two diverse environments: CartPole and Lunar Lander. DRL algorithms are widely known fo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Alberto del Rio, David Jimenez, Javier Serrano
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	A3C CartPole comparison environment complexity Lunar Lander performance analysis
Online Access:	https://ieeexplore.ieee.org/document/10703056/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850283098836566016
author	Alberto del Rio David Jimenez Javier Serrano
author_facet	Alberto del Rio David Jimenez Javier Serrano
author_sort	Alberto del Rio
collection	DOAJ
description	This research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization (PPO), in the context of two diverse environments: CartPole and Lunar Lander. DRL algorithms are widely known for their effectiveness in training agents to navigate complex environments and achieve optimal policies. Nevertheless, a methodical assessment of their effectiveness in various settings is crucial for comprehending their advantages and disadvantages. In this study, we conduct experiments on the CartPole and Lunar Lander environments using both A3C and PPO algorithms. We compare their performance in terms of convergence speed and stability. Our results indicate that A3C typically achieves quicker training times, but exhibits greater instability in reward values. Conversely, PPO demonstrates a more stable training process at the expense of longer execution times. An evaluation of the environment is needed in terms of algorithm selection, based on specific application needs, balancing between training time and stability. A3C is ideal for applications requiring rapid training, while PPO is better suited for those prioritizing training stability.
format	Article
id	doaj-art-41fb2d9528384ef8bd7f7734010a711e
institution	OA Journals
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-41fb2d9528384ef8bd7f7734010a711e2025-08-20T01:47:50ZengIEEEIEEE Access2169-35362024-01-011214679514680610.1109/ACCESS.2024.347247310703056Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General EnvironmentsAlberto del Rio0https://orcid.org/0000-0002-6832-4381David Jimenez1https://orcid.org/0000-0002-7382-4276Javier Serrano2https://orcid.org/0000-0003-2111-187XSignals, Systems and Radiocommunications Department, Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, Madrid, SpainPhysical Electronics, Electrical Engineering and Applied Physics Department, Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT), Universidad Politécnica de Madrid, Madrid, SpainInformatic Systems Department, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos (ETSISI), Universidad Politécnica de Madrid, Madrid, SpainThis research article presents a comparison between two mainstream Deep Reinforcement Learning (DRL) algorithms, Asynchronous Advantage Actor-Critic (A3C) and Proximal Policy Optimization (PPO), in the context of two diverse environments: CartPole and Lunar Lander. DRL algorithms are widely known for their effectiveness in training agents to navigate complex environments and achieve optimal policies. Nevertheless, a methodical assessment of their effectiveness in various settings is crucial for comprehending their advantages and disadvantages. In this study, we conduct experiments on the CartPole and Lunar Lander environments using both A3C and PPO algorithms. We compare their performance in terms of convergence speed and stability. Our results indicate that A3C typically achieves quicker training times, but exhibits greater instability in reward values. Conversely, PPO demonstrates a more stable training process at the expense of longer execution times. An evaluation of the environment is needed in terms of algorithm selection, based on specific application needs, balancing between training time and stability. A3C is ideal for applications requiring rapid training, while PPO is better suited for those prioritizing training stability.https://ieeexplore.ieee.org/document/10703056/A3CCartPolecomparisonenvironment complexityLunar Landerperformance analysis
spellingShingle	Alberto del Rio David Jimenez Javier Serrano Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments IEEE Access A3C CartPole comparison environment complexity Lunar Lander performance analysis
title	Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments
title_full	Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments
title_fullStr	Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments
title_full_unstemmed	Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments
title_short	Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments
title_sort	comparative analysis of a3c and ppo algorithms in reinforcement learning a survey on general environments
topic	A3C CartPole comparison environment complexity Lunar Lander performance analysis
url	https://ieeexplore.ieee.org/document/10703056/
work_keys_str_mv	AT albertodelrio comparativeanalysisofa3candppoalgorithmsinreinforcementlearningasurveyongeneralenvironments AT davidjimenez comparativeanalysisofa3candppoalgorithmsinreinforcementlearningasurveyongeneralenvironments AT javierserrano comparativeanalysisofa3candppoalgorithmsinreinforcementlearningasurveyongeneralenvironments

Comparative Analysis of A3C and PPO Algorithms in Reinforcement Learning: A Survey on General Environments

Similar Items