University of Zurich T-RO Paper: HDVIO2.0 Stabilizes VIO Under Strong Wind and Model Mismatch
Does VIO become unreliable when the wind starts blowing? Visual-inertial odometry (VIO) has become a standard onboard state-estimation method for micro UAVs. However, during high-speed maneuvers, under strong aerodynamic effects, or in long-duration flights with continuous disturbances such as strong wind, conventional systems that rely only on camera and IMU data often suffer from obvious error accumulation and rapidly degraded positioning accuracy.
To address this problem, the Robotics and Perception Group (RPG) at the University of Zurich proposed HDVIO2.0, or Hybrid Dynamics VIO, published in IEEE Transactions on Robotics. The system remains stable under strong wind and model mismatch. It was tested on multiple public UAV dynamics datasets and in real flights with wind speeds of about 25 km/h, significantly improving trajectory accuracy and wind-disturbance estimation compared with existing methods.
Video source: https://www.youtube.com/watch?v=wUaEp0YGpDM
Technical Challenges
Recent methods such as VIMO, VID, and HDVIO introduce UAV dynamics constraints into VIO. The goal is to use thrust, torque, and other control inputs to distinguish motion caused by the vehicle’s own commands from motion caused by external disturbances, while estimating wind and other external forces.
However, under high-speed flight, strong wind, and inaccurate models, traditional methods face several key limitations:
- Over-simplified dynamics: many methods use only rigid-body models plus linear drag. Complex aerodynamics and thrust calibration errors are absorbed into external force or IMU bias, causing persistent crosswind to be incorrectly modeled and the estimate to drift.
- Translation constraints without rotation constraints: existing methods mostly use dynamics for displacement and velocity, while attitude still depends on gyro integration. Long flights or aggressive maneuvers can therefore produce attitude drift.
- High-fidelity models are hard to embed directly into VIO: models such as NeuroBEM need true velocity and attitude as inputs. If they are directly coupled with VIO, estimation errors can amplify each other.
Research Approach
HDVIO2.0 introduces a hybrid dynamics model into a traditional visual-inertial backend, allowing a physics model and neural-network residuals to jointly constrain state estimation and external-force estimation.

Unified Factor-Graph Framework
The research team formulates the entire problem in a sliding-window optimization framework. Visual, inertial, and dynamics residuals are unified through a factor graph. The system maintains keyframe poses, UAV states, IMU biases, external forces, and landmarks within the window. The cost function includes visual reprojection residuals, IMU preintegration residuals, dynamics residuals, and marginalization residuals.

Continuous-Time Rotation Modeling
HDVIO2.0 represents body-frame angular velocity with B-splines. Control points are distributed uniformly over time, allowing efficient sampling and differentiation in matrix form. Torque measurements are used to fit the B-spline, with control points optimized to satisfy rigid-body rotational equations. This introduces additional physical constraints into the backend attitude estimation.
For online optimization, when a new image arrives, control points are added and removed on a fixed-length B-spline of about 0.1 seconds and locally optimized. Regardless of spline order and control-point spacing, representing angular velocity with VelBSpl converges faster than representing attitude directly with RotBSpl.

Dual-TCN Residual Dynamics Modeling
The dynamics model consists of explicit physics plus learned residuals. The physics model handles thrust, gravity, inertia, and other explicit terms, while residual force and torque absorb aerodynamic drag, thrust-coefficient error, and other complex effects.
HDVIO2.0 uses two temporal convolutional networks (TCNs): one predicts residual thrust from thrust commands and gyro measurements, while the other predicts residual torque from torque commands and gyro measurements. The inputs are only control commands and IMU data; true velocity and attitude are not required.
During training, the network learns residuals by minimizing the difference between dynamics-integrated displacement, velocity, and attitude changes and trajectories from MoCap or offline SLAM. No additional force sensor is required.
Experiments
NeuroBEM Dataset
On the NeuroBEM dataset, the team compared different dynamics models using only thrust/torque and IMU inputs, without running VIO. The comparison focused on force and torque estimation accuracy.
HDVIO2.0 significantly outperformed simple polynomial and BEM models in force-estimation RMSE and approached the performance of NeuroBEM, which requires full-state input. This shows that the hybrid “rigid-body + TCN residual” dynamics model is sufficiently accurate even without full state input.


Blackbird and VID Datasets
Under conditions without external disturbance, the team evaluated HDVIO2.0 trajectory accuracy and generalization in high-speed flight.
For Blackbird, trajectories from 0.5 to 9 m/s were used, including images, IMU, and motor speed. Some trajectories were used to train dynamics, while others were used for testing. For VID, indoor tests included elastic-rope pulling and payload experiments in a MoCap environment, while outdoor tests used offline SLAM pose for supervision.
The results show that HDVIO2.0 achieved the lowest position and attitude error on most Blackbird trajectories, especially at high speeds such as the Egg 8 m/s trajectory. In VID experiments, HDVIO2.0 achieved the lowest z-axis external-force RMSE in rope-pulling and payload scenarios. It also reduced attitude error in outdoor slow trajectories supervised only by SLAM, indicating usefulness even without MoCap.


Wind Field and Closed-Loop Flight
The team built an indoor side-wind field using three fans, with a maximum wind speed of about 25 km/h. The UAV carried a T265 and Jetson TX2, running HDVIO2.0 using only the T265 left camera and IMU, while MoCap provided ground truth.
Circle and figure-eight trajectories were tested with and without drag plates. The system was further connected to an EKF and controller for closed-loop flight and compared with T265 onboard tracking.
In wind experiments, HDVIO2.0 achieved the lowest lateral-wind-force RMSE across multiple configurations and followed ground-truth force peaks well. Accelerometer bias estimation remained stable, unlike some baselines that drifted under persistent wind. In closed-loop flight, HDVIO2.0 produced smaller trajectory error than T265 and ran in real time at 30 Hz on TX2.


Resources
- Paper: HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO
- Paper link: https://doi.org/10.1109/TRO.2025.3603551
- Code: https://github.com/uzh-rpg/hdvio2.0
