Core technical principles for fusing sensor data with deep learning

intelligent automotive vehicles

Deep learning-based sensor fusion for situational awareness is a notably different approach from classic mathematical modeling. While the underlying core tasks of perception, prediction, and planning remain the same, deep learning tackles situational awareness in an integrated manner where all tasks are jointly considered and meaningful representations are directly learned from training data. Compared to classical mathematical modeling, deep learning can build a richer representation of the environment and achieve superior levels of performance, however, its success depends on the quality of the data pipeline as well as on the MLOps and deployment solutions that orchestrate the system and facilitate systematic model training and evaluation.

With deep learning, one way to fuse data is to build multiple hierarchical levels of deep learning models that are roughly speaking responsible for perception, prediction, and planning tasks. These tasks are crucial in any autonomous operation. Typically, the lower levels of the neural networks are responsible e.g. for detecting objects of interest, segmenting the environment into drivable terrain and other classes, and producing depth estimates from individual cameras. In mathematical terms, neural networks compress the raw information from sensors into so-called feature maps which are used to produce interpretable and actionable outputs from the model, for instance, the location of surrounding vehicles. These feature maps can be reused in the next layers of models to fuse information across sensors. An example of this would be to build a 360-degree birds-eye-view map of the surroundings by stitching together observations from all sensors.

These outputs in turn form the inputs for subsequent models that aggregate information across time. In this case, the task could be to predict how other moving agents will behave in the future to support action planning. To put it concretely, it would mean a car would perceive pedestrians, and then predict that they are about to cross the road. After this, the car would plan for the most suitable action, e.g. slow down or give way.

Other ways how sensor data can be fused in deep learning-based awareness systems includes feeding data from multiple sensors directly to a neural network and building explicit connections between neural networks dedicated to different tasks in the modeling stack. To give an example of the former, multiple camera views of a scene can be used for depth estimation or a combination of Lidar and images can be used to improve 3D object detection or semantic segmentation. Mid-level sensor fusion means feature sharing between models that are responsible for different but related tasks. A case like this would be object detection modules that are responsible for adjacent cameras but have a partially overlapping field-of-view, in other words, the area the cameras cover in the real world. In its most sophisticated form, late fusion can be performed where the outputs from the sensor- or task-specific models are combined as inputs to later stages of modeling. This would be needed when building e.g. the birds-eye-view map as mentioned above or by utilizing a multi-camera derived point cloud for 3D object detection. 

Image of Pertti Hannelin

Interested in discussing deep learning-based sensor fusion with our experts?

Get in touch with Pertti Hannelin, our VP of Business Development at or via LinkedIn.


No items found.

Want to discuss how Silo AI could help your organization?

Get in touch with our AI experts.
Nico Holmberg, PhD
Lead AI Scientist
Silo AI
Share on Social
Subscribe to our newsletter

Join the 5000+ subscribers who read the Silo AI monthly newsletter to be among the first to hear about the latest insights, articles, podcast episodes, webinars, and more.

Nico Holmberg, PhD

Lead AI Scientist

Silo AI

Computer vision expert with experience in building deep learning based solutions for clients in various industries with use cases ranging from situational awareness systems for heavy industrial machines to advanced video analytics for safety and security. One of his main interests includes optimizing neural network models for deployment on embedded devices and AI accelerators for edge inference applications. Nico holds a PhD in Computational Quantum Chemistry from Aalto University, Finland. During his PhD studies his research focused on developing new methods to design effective materials for renewable energy applications. Author of 12 research papers with 600+ citations.

What to read next

Ready to level up your AI capabilities?

Succeeding in AI requires a commitment to long-term product development. Let’s start today.