The design of our dense transformer module is based on the principle of how different regions in the 3D world are projected onto a perspective 2D image. Specifically, a column belonging to a flat region in the FV image projects onto a perspectively distorted area in the BEV, whereas a column belonging to a vertical non-flat region maps to an orthographic projection of a volumetric region in the BEV space. Accordingly, we employ two distinct transformers in our dense transformer module to independently map features from the vertical and flat regions in the FV to the BEV. Reflecting our observation, the vertical transformer expands the FV features into a volumetric lattice to model the intermediate 3D space before flattening it along the height dimension to generate the vertical BEV features. Parallelly, the flat transformer uses the IPM algorithm followed by our Error Correction Module to map the flat FV features into the BEV. The transformed vertical and flat features are then merged in the BEV space to generate the composite BEV feature map.
A Neural Network-Based Approach to Generate Bird's-Eye-View Panoptic Segmentation Maps Using Frontal-View Monocular Images
These examples demonstrate the performance of our PanopticBEV model on the KITTI-360 and nuScenes datasets. PanopticBEV is the first end-to-end learning approach for directly generating dense panoptic segmentation maps in the bird's eye view given monocular images in the frontal view. To learn more about our work, please see the Technical Approach section. View the demo by selecting a dataset from the drop down box below and click on an image to view the predicted BEV panoptic segmentation map.
If you are unable to view the images for the KITTI-360 and nuScenes datasets below, please disable your AdBlocker and refresh the page.