Visual-Inertial Gaussian Splatting (VINGS-MONO) is a novel approach to monocular SLAM that combines the power of Gaussian Splatting with visual-inertial odometry. Our system achieves state-of-the-art performance in both indoor and outdoor environments, with particular strength in handling large-scale scenes.

We propose a novel Gaussian Splatting based loop detection and correction method. Instead of using the Bag of Words (BoW) approach for loop detection, we leverage the novel view synthesis capabilities of gaussian splatting from new viewpoints to determine if a loop has been detected. Following this, we use graph optimization to correct poses and use gaussians' frame index to correct the 2D Gaussian Map.

Our method does not require depth priors and reconstructs maps with higher visual fidelity and fewer artifacts within a few minutes of training for an indoor scene. It preserves fine textures and effectively captures structural details.
Our method demonstrates unparalleled robustness in outdoor mapping, achieving precise geometric estimation and reconstruction even in sparse and complex structures.
Compared to indoor environments, outdoor scenes pose greater challenges due to significantly longer trajectory lengths (ranging from hundreds to thousands of meters) and larger frame intervals. These factors heavily impact the rendering quality and map storage requirements of baseline methods.
On handheld datasets such as Hierarchical, characterized by random motion trajectories and limited capture ranges, our method achieves high-precision modeling of building edges and surface details.
For the online reconstruction demonstration of large outdoor scenes, we selected KITTI odom08, which has a trajectory length of 3.7 km. The complete Gaussian map consists of 32million Gaussian ellipsoids.
The trajectory length of KITTI360's scene is 8.05 km, and the entire Gaussian map contains 51.73 million ellipsoids. We recorded the number of Gaussians throughout the training process and zoomed in on different parts of the map for clearer visualization.
We ran VINGS-Mono in a mobile phone setup. We developed an app using the Flutter framework, which can be deployed on iOS, Android, and Windows platforms.
Additionally, in outdoor scenes under low-light conditions, our method achieves high-quality reconstruction of highly exposed areas, such as illuminated signboards.
To further test the stability and robustness of our method, we collected a large outdoor dataset. This dataset covers the entire Campus and was recorded using an iPhone device. It spans approximately 1.02 km in length and 0.4 km in width.