Flight Dynamics and Control Lab | Taeyoung Lee

Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control

Tue, 21 Apr 2026 12:50:20 +0000

Teaching quadrotor unmanned aerial vehicles (UAVs) to fly autonomously is a high-stakes engineering hurdle. Because quadrotors are inherently unstable and exhibit complex, non-linear dynamics, training them using standard model-free Reinforcement Learning (RL) is notoriously data-intensive and risky. In a typical RL setup, an agent must experience thousands of “crashes” in simulation to learn basic stability, a process that is often computationally expensive and unsafe.

Standard multilayer perceptron (MLP)-based RL models often treat every new orientation or position as a completely unique configuration, requiring the RL agent to learn all information from scratch. To address this, this study introduces Equivariant Reinforcement Learning. This framework uses geometric principles to make learning significantly faster and more robust by embedding the invariance and equivariance properties directly into the neural network architecture.

Geometry as a Shortcut: The Power of Equivariance

The breakthrough in this research lies in identifying rotational and reflectional symmetries present in the quadrotor dynamics. By using Equivariant Multilayer Perceptrons (EMLPs), the drone can learn an optimal control action in one configuration and automatically apply it to others.

Two Brains are Better Than One: Modular vs. Monolithic

A core strategic insight of this work is the move away from “monolithic” (single-agent) brains toward specialized “modular” (multi-agent) architectures.

Feature	Monolithic (Mono-MLP/EMLP)	Modular (Mod-MLP/EMLP)
Structure	Single agent manages all flight aspects.	Decoupled: Translation and Yaw modules.
Geometry	Operates on the full $SO(3)$ group.	Splits $SO(3)$ into $S^2$ (thrust) and $S^1$ (yaw).
Efficiency	Slower convergence; prone to overfitting.	Parallelized learning; superior tracking.
Sample Requirement	Still struggling at $8 \times 10^5$ timesteps.	Near-peak rewards within $\approx 2 \times 10^5$ timesteps.

By splitting the three-dimensional orthogonal group $SO(3)$ into a sphere ($S^2$) for translational control and a circle ($S^1$) for yaw control, the modular approach prevents performance degradation during agile maneuvers. As evidenced by learning curves, the Mod-EMLP architecture achieves significantly higher returns early in the training phase compared to any other framework.

Zero-Shot Sim-to-Real Transfer Validation

A major hurdle in robotics is “Sim-to-Real” transfer. To bridge this, this work utilized Domain Randomization, where physical parameters–such as mass, arm length, and inertia–were uniformly sampled within $\pm 10\%$ of their nominal values during training. This forces the policy to be robust rather than environment-specific.

The framework was validated through a Zero-Shot Transfer process, and the experimental setup at the Flight Dynamics and Control Lab included:

Vicon Valkyrie VK-8 Motion Capture: A 12-camera system running at 200 Hz for precise position and attitude estimation.
NVIDIA Jetson TX2: An onboard flight computer managing the RL controller and EKF state estimation in real-time.

Analysis of figure-eight Lissajous trajectory flight tests shows that incorporating equivariant learning within a modular design is the key:

Metric	Monolithic MLP (Baseline)	Modular EMLP (Winner)
Average Position Error	14.63 cm	8.44 cm (Nearly 42% reduction)
Average Yaw Error	7.15 degrees	4.33 degrees (Highly precise)
Generalization	Must manually learn every configuration, leading to redundancies in learning.	Encodes rotational/reflectional symmetry for 10x more efficient learning.

The Future of Geometric Learning in Robotics

While this study utilized quadrotors, these geometric RL methods are a blueprint for any robotics system with underlying symmetries, from bipedal robots to industrial manipulators. Now we must ask:

If we can eliminate the redundancy of learning through geometric priors, are we finally moving toward an era of “Near-Zero-Data” robotics where physics-informed architectures replace brute-force computation?

For a detailed look at the study, check out the research publication and explore the open-source training code.

Ship-Relative UAV Pose Estimation with 3D LiDAR

Wed, 27 Aug 2025 17:00:00 +0000

Accurate ship-relative navigation is essential for enabling safe autonomous landing and operations of UAVs at sea. Traditional systems rely heavily on GPS or cameras, but these are often unreliable in maritime environments. For mission critical applications, GPS is vulnerable to to jamming and spoofing, while cameras suffer in low light and harsh weather.

To address this, we developed a learning-based pose estimation pipeline that uses 3D LiDAR scans to estimate the full six-degree-of-freedom (6DoF) pose of a UAV relative to a ship, trained in simulation and validated with real-world data collected on a US Naval Academy YP689 research vessel.

Challenges in Shipboard Pose Estimation

LiDAR provides robustness to lighting and visibility compared to cameras, but ship-relative pose estimation introduces several challenges. Scans are often sparse and incomplete, especially at longer ranges. Furthermore, occlusions from ship structures can obscure key features, and when the ship is the only visible object in the scene, the network has limited contextual information to rely on for accurate alignment. These factors make classical registration algorithms like ICP¹, as well as learned alignment methods such as PointNetLK², prone to failure.

Dataset Generation and Collection

Developing any deep learning model requires large amounts of training data, which is especially difficult to obtain in real maritime settings. To address this, we built a simulation environment in Gazebo using a CAD model of the YP689 training vessel. Within this environment, a virtual Ouster OS0-32 LiDAR was mounted on a UAV model to generate thousands of scans. These synthetic scans capture the ship geometry as viewed from varied positions and orientations representative of actual flight trajectories.

In addition to simulation data, we collected real-world validation data during experiments aboard the US Naval Academy’s YP689 research vessel in the Chesapeake Bay, MD. A UAV equipped with a real Ouster OS0-32 collected LiDAR data during full-length flight trajectories around the ship. High-accuracy ground-truth poses were also obtained using inertial integration³, as well as RTK GPS for benchmarking.

Proposed Model Architecture

The core of the system is a Point Transformer-based neural network adapted for ship-relative pose estimation.

Input: A single LiDAR scan (downsampled to 1024 points, normalized).
Feature extraction: Learns local and global scan features using self-attention layers.
Keypoint prediction: Network predicts ship keypoints in the LiDAR frame, using fixed CAD keypoint embeddings as decoder queries.
Pose estimation: These predicted keypoints are matched with the ground truth ones, obtaining a closed form algorithm⁵.
Refinement: Finally, a lightweight registration further refines the alignment, where the model output serves as the initial guess.

Training Procedure

The model was trained using 20,000 synthetic LiDAR scans. To better reflect real-world conditions, scans were augmented with Gaussian sensor noise and occlusions from flight deck obstructions. For the training, we utilized an NVIDIA A100 GPU and took about 4 hours to complete 100 epochs on our dataset.

A key aspect of this work is that the network was trained only on simulated data, meaning it never saw real LiDAR scans during training. This allowed us to directly assess the sim-to-real gap and whether a transformer-based approach could generalize well from simulation to the real-world. In particular, this ability would allow for easier scalability to new or potentially unknown ship classes.

Results and Evaluation

When evaluated on real-world LiDAR scans, the model achieved a mean rotation error of 2.6° and a mean translation error of 0.36 m directly from the network output. After applying a lightweight refinement step with GICP⁴, the accuracy improved substantially, reducing rotation errors to 0.62° and translation errors to 7 cm. Notably, more than 96% of predictions had less than 20 cm of translation error relative to the ground truth pose.

These results show that a model trained entirely in simulation can generalize effectively to real-world LiDAR scans collected from shipboard UAV operations. Compared to classical methods such as ICP and PointNetLK, our approach proved significantly more robust to sparsity and partial views, non-uniform point distribution, and doesn’t require an initial pose estimate.

Future Directions

Future directions include incorporating environmental disturbances such as sea spray and dynamic occlusions into simulation to better capture real-world operating conditions. Beyond LiDAR alone, more reliable navigation can be achieved through multi-sensor fusion, integrating vision and inertial measurements to complement LiDAR’s robustness. Finally, optimizing the model for embedded hardware such as NVIDIA Jetson platforms would enable efficient, real-time deployment directly on UAV flight computers.

This work is based on my M.S. thesis Ship-Relative UAV Pose Estimation with 3D LiDAR (2025), advised by Prof. Taeyoung Lee. The source code is available at github.com/fdcl-gwu/point-transformer.

¹ : P.J. Besl and Neil D. McKay. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2):239–256, 1992.

² : Yasuhiro Aoki, Hunter Goforth, Arun Srivatsan Rangaprasad, and Simon Lucey. Pointnetlk: Robust & efficient point cloud registration using pointnet. pages 7156–7165, 06 2019.

³ : Kenny Chen, Ryan Nemiroff, and Brett T. Lopez. Direct lidar-inertial odom- etry: Lightweight lio with continuous-time motion correction. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3983–3989, 2023.

⁴ : Aleksandr Segal, Dirk Hähnel, and Sebastian Thrun. Generalized-icp. 06 2009.

⁵ : Jim Lawrence, Javier Bernal, and Christoph Witzgall. A purely algebraic justifi- cation of the kabsch-umeyama algorithm. Journal of Research of the National Institute of Standards and Technology, 124, October 2019.

Data-driven Controls of a Flapping Wing UAV

Mon, 20 Jan 2025 17:00:00 +0000

The Monarch butterfly, native to North America, is renowned for its distinctive characteristics. Each year, millions of Monarchs embark on an incredible migration journey from North America to Mexico, covering a staggering distance of up to 4000 km. Flapping wing aerial vehicles can offer significant advantages in energy efficiency and agility compared to conventional fixed or rotary wing types, whose lift-to-drag ratio deteriorates rapidly as their size is reduced.

Consequently, the flapping wing mechanism has been envisioned as a critical component for micro autonomous drones of the next generation.

While numerous bioinspired robots have been developed, advancements in control systems for FWUAVs lag behind those for more traditional unmanned aerial vehicles like quadrotors. This disparity is due to the complexity of modeling and controlling flapping-wing dynamics. Most existing models focus on high-frequency flapping of small wings, neglecting the intricate coupling between wing motion and body dynamics - key features of butterfly flight.

Geometric dynamic model and control

Inspired by the Monarch butterfly’s flight, a novel dynamic model has been developed to account for the effects of low-frequency wing flapping and abdomen undulation. This provides an elegant, global formulation of the dynamics, avoiding complexities and singularities associated with local coordinates. Next, an optimal periodic motion that minimizes the energy variations was constructed, and a feedback control system was proposed to asymptotically stabilize it according to the Floquet stability theory ¹.

Constrained imitation learning from optimal trajectories

For the global motion of FWUAV which includes both translation and rotation of the body (6DOF), a periodic optimal controller was utilized after identifying specific wing kinematics parameters and their physical relationship to aerodynamics. To further improve computational efficiency, imitation learning was uniquely tailored to transform a set of optimal trajectories into data-driven feedback control. Compared with conventional methods, constrained imitation learning (COIL) eliminates the need to generate additional optimal trajectories on-line, while simultaneously improving stability properties ².

Modular control scheme using visual-inertial data

Finally, a vision-based control scheme was proposed to avoid estimating the state of the flapping wing aerial vehicle in real time. A deep neural pose estimator, based on a Siamese network, extracts robust features for better performance compared to traditional keypoint-based methods ³. Instead of training a monolithic network, we proposed a modular construction where a pose estimation network and a control network are concatenated, which were trained alternatingly to achieve the complex tasks of end-to-end perception and control efficiently. To improve convergence when combined with the controller, an alternating learning algorithm (ALICE) was presented to iteratively refine the individual neural networks so that ultimately the vehicle can be guided to perform given maneuvers.

This framework establishes the first nonlinear control system that stabilizes the coupled 6DOF longitudinal and lateral dynamics of FWUAVs without relying on the common assumptions of averaging or linearization. Possible future directions:

Including flexibility of wings and fluid-structure interactions for improved modeling
Learning-based aerodynamic models that balance accuracy with computational efficiency, further advancing FWUAV control systems.

¹ : Tejaswi, K.C., Kang, C.K. and Lee, T., 2021, May. Dynamics and control of a flapping wing uav with abdomen undulation inspired by monarch butterfly. In 2021 American control conference (ACC) (pp. 66-71). IEEE.

² : Tejaswi, K.C. and Lee, T., 2022. Constrained Imitation Learning for a Flapping Wing Unmanned Aerial Vehicle. IEEE Robotics and Automation Letters, 7(4), pp.10534-10541.

³ : Tejaswi, K.C., Lee, T. and Kang, C.K., 2024. Deep Neural Pose Estimation for a Flapping Wing Unmanned Aerial Vehicle with Visual-Inertial Sensor Fusion. In AIAA SCITECH 2024 Forum (p. 0948).

Modular Reinforcement Learning for a Quadrotor UAV

Tue, 14 Jan 2025 12:01:35 +0000

Imagine a quadrotor drone gliding through complex environments, expertly handling tight corners and sharp turns while maintaining precise control of its heading. Achieving this level of dexterity is no small feat, particularly because traditional reinforcement learning (RL)-based control methods often struggle to balance the delicate dynamics of translational and yawing motions.

What happens when we give drones a modular brain, designed to focus on specific tasks like balance and direction independently?

Unlocking the Potential of Modular RL

Traditional RL approaches to controlling quadrotors often rely on an end-to-end monolithic policy—a single, unified neural network—designed to manage all aspects of flight dynamics simultaneously. While effective in many cases, these models face challenges when dealing with complex interactions of roll, pitch, and yaw motions. Why? Because rapid yawing motions can destabilize the drone’s other motions, as excessive acceleration/deceleration of quadrotor rotors potentially amplifies the imbalances.

Enter modular reinforcement learning. By dividing the quadrotor’s dynamics into two distinct parts—translational and yaw subsystems—modular RL allows specialized agents to manage each task independently. This decomposition not only simplifies the learning complexity but also enhances system robustness, as one module can continue operating even if the other encounters difficulties.

A Look at the Learning Curves

The proposed modular RL policies, namely CMP and DMP, achieved faster convergence and superior performance compared to their monolithic counterparts, NMP. By focusing on specific subtasks, each agent can optimize its learning for its assigned task, avoiding the complexities and inefficiencies of learning a single, generalized policy.

Additionally, we explored centralized and decentralized inter-module communication between two agents. Centralized coordination allowed agents to share information during training, resulting in improved synergy and higher performance. Decentralized agents, while effective, occasionally faced coordination challenges in high-demand scenarios.

From Simulation to Sky

Training RL agents in simulation is just the beginning; the real flight test comes in deploying them in the physical world. By employing sim-to-real techniques such as domain randomization (introducing variability in simulators) and policy regularization (smoothing control signals), trained RL agents transferred seamlessly from simulation to reality.

In flight tests, our modular framework consistently outperformed traditional monolithic approaches, particularly during aggressive maneuvers requiring high yaw rates. While traditional models struggled to maintain stability at high yaw rates (e.g., 40 deg/s), the modular agents demonstrated remarkable precision and adaptability. Another advantage? If one module needed fine-tuning, it could be adjusted without retraining the entire system, saving time and resources

What’s Next for Modular RL in Robotics?

Our work marks an important step forward in drone autonomy, but the implications go far beyond quadrotors. From self-driving cars to robotic arms in manufacturing, the modular RL approach promises smarter, more efficient, and more adaptable solutions.

So, next time you see a drone executing a flawless aerial maneuver, remember—it might just have a modular brain guiding its every move.

For a detailed look at the study, check out the research publication and explore our training code.