DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: A Study of Q-Learning in the Taxi-v3 Environment: Reinforcement Learning for Optimal Navigation
Authors: Mirza Muhammad Abbas, Abdul Lahad, Areej Fatemah Meghji
Journal: KIET Journal of Computing & Information Sciences
Publisher: Karachi Institute of Economics & Technology Karachi
Country: Pakistan
Year: 2025
Volume: 8
Issue: 1
Language: en
Keywords: Reinforcement LearningHyperparameter TuningQ-LearningTaxi-v3
Reinforcement Learning (RL) has widely showcased its effectiveness across a variety of domains including healthcare, robotics, gaming, and autonomous driving. RL involves teaching an agent to navigate through an environment whilst trying different actions to receive feedback in terms of rewards and penalties. This leads to an iterative process of learning how to take actions that provide the most rewards. A widely used model-free algorithm in RL is the tabular Q-learning algorithm which aims to identify an optimal policy by selecting actions that maximize rewards. This research takes a deeper look into the application of Q-learning in the Taxi-v3 environment, a popular environment for evaluating different RL algorithms. Specifically, our study focuses on hyperparameters and their optimization to determine how they impact the performance of the agent in the Taxi-v3 environment. To assess how the agent performs in the environment, we use the rewards that the agent obtains throughout each episode, the steps it takes in each episode to finish the task, and the loss values that indicate how well the agent was able to predict the optimal actions required for a given state. Our agent attained the highest reward during episode 1,196 during the initial values, and when we optimized the hyperparameters and used the fine-tuned values, the agent achieved the same reward during episode 248. This behavior of the agent after the fine-tuning process suggests that optimizing the hyperparameters leads to the agent learning an optimal policy early on in the training and improves the quality of the Q-learning algorithm in solving challenges that involve navigating through grid-based worlds.
Loading PDF...
Loading Statistics...