Estimating odometry, where an accumulating position and rotation is tracked, has critical applications in many areas of robotics as a form of state estimation such as in SLAM, navigation, and controls. During deployment of a legged robot, a vision system's tracking can easily get lost. Instead, using only the onboard leg and inertial sensor for odometry is a promising alternative. Previous methods in estimating leg-inertial odometry require analytical modeling or collecting high-quality real-world trajectories to train a model. Analytical modeling is specific to each robot, requires manual fine-tuning, and doesn't always capture real-world phenomena such as slippage. Previous work learning legged odometry still relies on collecting real-world data, this has been shown to not perform well out of distribution. In this work, we show that it is possible to estimate the odometry of a legged robot without any analytical modeling or real-world data collection. In this paper, we present Legolas, the first method that accurately estimates odometry in a purely data-driven fashion for quadruped robots. We deploy our method on two real-world quadruped robots in both indoor and outdoor environments. In the indoor scenes, our proposed method accomplishes a relative pose error that is 73% less than an analytical filtering-based approach and 87.5% less than a real-world behavioral cloning approach.
In this baseline rollout, Legolas is compared to relevent filtering-based and learning-based baselines. In this rollout, the behavorial cloning baseline that was trained from real-world data fails to even leave the area where the robot started. VINS-Fusion, as a visual-baseline in this rollout failed due to lighting reflections causing the estimate to diverge. The filtering baseline, built off analytical models closely tracks the ground-truth trajectory. However, it falls in quality when compared to Legolas. (Third person view and map are for visualization purposes.)
We are able to deploy legolas across diverse terrains such as onto stairs, where it can estimate accurate forward (x), upward (z), and pitch state. Third person view for visualization purposes.
Deploying Legolas, onto a rollout of 180m, it is able to accurately predict the pose of the robot with very little drift outdoors. Front camera used for visualization.
Legolas outputs a predicted instantenous velocity and variance for each prediction. We demonstrate the covariance output of Legolas on two of the above rollouts.
Analysis: In this scenario, the |Y| displacement and variance are much smaller than the |X| displacement. This matches what is expected as the robot is moving forward, and this would be dominated by an |X| displacement with a smaller |Y| displacement from swaying. After completing the first hallway, the robot turns sharply and causes a dip in displacements, which are matched with a dip in variance as the robot is moving slower. The lower variances when the robot is moving slower can also be observed at the start of the plot as the robot starts to move.
Analysis: While going up the stairs the |X| and |Z| displacements are very cyclical with their peaks being matched with peaks in variance occurring wherever the displacement changes quickly. When the robot reaches the new floor, variances for |X| and |Z| decrease as there is less instantaneous motion.
We include visualizations of the robot ground-truth trajectories collected for the test dataset.
In order to showcase how Legolas can be used to improve the robustness of a visual SLAM system we demonstrate the following experiment. This also highlights the large amount of pertubations that are caused by locomotion.
On its own, the VIO baseline fails to make it past the carpeted area, and loses tracking at the 0:20 mark.
In contrast, when combined with Legolas, the VIO baseline is able to keep mapping.