DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Multimodal Sensor Fusion in Autonomous Driving: A Deep Learning-Based Visual Perception Framework


Article Information

Title: Multimodal Sensor Fusion in Autonomous Driving: A Deep Learning-Based Visual Perception Framework

Authors: Hadi Abdullah, Majeed Ali, Ijaz khan, Abdullah Faiz , Syed Haider Abbas Naqvi, Ali Majid

Journal: Kashf Journal of Multidisciplinary Research (KJMR)

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31

Publisher: Kashf Institute of Development & Studies

Country: Pakistan

Year: 2025

Volume: 2

Issue: 6

Language: en

DOI: 10.71146/kjmr490

Keywords: Deep learningObject detectionVisual PerceptionLIDARAutonomous DrivingradarMultimodal Sensor FusionTransformer ArchitectureRGB CameraReal-Time Systems

Categories

Abstract

Autonomous driving has triggered the evolution of multimodal sensor fusion systems due to the needs to provide safety, reliability, and real-time environmental awareness. The study proposes a visual perception framework called FusionNet, which is a deep learning-based visual perception framework that has an intermediary fusion approach (enabled by transformers) that combines RGB camera, LiDAR, and radar data. In contrast to classic early or late fusion techniques, FusionNet uses modality-specific encoders and cross-attention layers to mutually adjust and merge semantic and geometric features dynamically. The massive test on the KITTI and nuScenes data sets have shown that FusionNet not only performs better in terms of increasing the mean Average Precision (mAP) than unimodal systems, but it also offers such an improvement in particularly adverse scenarios, like fog, low light, occlusion, among others, in which the unimodal systems do not perform well. The model is real-time capable with a time of 59 milliseconds per frame and it is robust under different weather conditions and in cases of bad sensors. Also, FusionNet has better localization quality on large IoU thresholds and could resist modality dropout training. These findings point to the future promise of deep multimodal fusion as a constituent building block of the future of autonomous vehicle perception systems capable of faithful deployment in a wide range of urban and environmental contexts.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...