DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

A Study of Q-Learning in the Taxi-v3 Environment: Reinforcement Learning for Optimal Navigation


Article Information

View Article

Title: A Study of Q-Learning in the Taxi-v3 Environment: Reinforcement Learning for Optimal Navigation

Authors: Mirza Muhammad Abbas, Abdul Lahad, Areej Fatemah Meghji

Journal: KIET Journal of Computing & Information Sciences

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2021-07-01 2022-06-30
Y 2020-07-01 2021-06-30

Publisher: Karachi Institute of Economics & Technology Karachi

Country: Pakistan

Year: 2025

Volume: 8

Issue: 1

Language: en

DOI: 10.51153/kjcis.v8i1.256

Keywords: Reinforcement LearningHyperparameter TuningQ-LearningTaxi-v3

Categories

Abstract

Reinforcement Learning (RL) has widely showcased its effectiveness across a variety of domains including healthcare, robotics, gaming, and autonomous driving. RL involves teaching an agent to navigate through an environment whilst trying different actions to receive feedback in terms of rewards and penalties. This leads to an iterative process of learning how to take actions that provide the most rewards. A widely used model-free algorithm in RL is the tabular Q-learning algorithm which aims to identify an optimal policy by selecting actions that maximize rewards. This research takes a deeper look into the application of Q-learning in the Taxi-v3 environment, a popular environment for evaluating different RL algorithms. Specifically, our study focuses on hyperparameters and their optimization to determine how they impact the performance of the agent in the Taxi-v3 environment. To assess how the agent performs in the environment, we use the rewards that the agent obtains throughout each episode, the steps it takes in each episode to finish the task, and the loss values that indicate how well the agent was able to predict the optimal actions required for a given state. Our agent attained the highest reward during episode 1,196 during the initial values, and when we optimized the hyperparameters and used the fine-tuned values, the agent achieved the same reward during episode 248. This behavior of the agent after the fine-tuning process suggests that optimizing the hyperparameters leads to the agent learning an optimal policy early on in the training and improves the quality of the Q-learning algorithm in solving challenges that involve navigating through grid-based worlds.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...