DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Analysis of Transmission Control Protocol Incast over Large-scale HPC Clusters


Article Information

Title: Analysis of Transmission Control Protocol Incast over Large-scale HPC Clusters

Authors: S. Khalid, H.M. Abdullah, S.Z. Ahmad

Journal: The Nucleus

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2020-07-01 2021-06-30

Publisher: Pakistan Institute of Nuclear Science & Technology (PINSTECH).

Country: Pakistan

Year: 2021

Volume: 58

Issue: 1-4

Language: English

Categories

Abstract

The lifecycle of large-scale applications executing on High-Performance Computing (HPC) clusters involves massive use of transmission control protocol (TCP) while performing orchestration for job completion on multiple compute resources. As the HPC clusters involve large local area network communication for distributing jobs over compute and data nodes, the core network fabric in cluster architecture faces heavy workloads of TCP sessions; causing more than average packet drop events. This results in the poor TCP throughput; thus reducing the overall performance indices of the cluster. In this article, we have analyzed the TCP behavior at nominal, average, and heavy transmission load in a cluster environment for assessing various alternatives to solve the problem. We have also analyzed the cumulative queuing behavior of multiple TCP sessions at the contention switch and used a fine-grained configuration at the network fabric to improve the TCP performance. The simulation results show that the smaller set of data flow suffers a significant throughput collapse. The performance of TCP variants tested indicates that the congestion control mechanism of these protocols plays a significant role in performance degradation and needs a scalable solution to improve TCP performance indices. In this paper, different versions of TCP are employed for an HPC compute cluster and data storage to cater to the TCP Incast problem and simple solutions are presented. It has been observed that none of the classical, as well as newer TCP variants, perform consistently under heavy fan-in workload but a better queue management system at the network fabric greatly simplifies the problem and improves the cluster performance.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...