Traversing the Narrow Path: A Two-Stage Reinforcement Learning Framework for Humanoid Beam Walking

Abstract

Traversing narrow paths is challenging for humanoid robots due to the sparse and safety-critical footholds required. Purely template-based or end-to-end reinforcement learning-based methods suffer from such harsh terrains. This paper proposes a two–stage training framework for such narrow path traversing tasks, coupling a template-based foothold planner with a low-level foothold tracker from Stage-I training and a lightweight perception aided foothold modifier from Stage-II training. With the curriculum setup from flat ground to narrow paths across stages, the resulted controller in turn learns to robustly track and safely modify foothold targets to ensure precise foot placement over narrow paths. This framework preserves the interpretability from the physics-based template and takes advantage of the generalization capability from reinforcement learning, resulting in easy sim-to-real transfer. The learned policies outperform purely template-based or reinforcement learning-based baselines in terms of success rate, centerline adherence and safety margins. Validation on a Unitree G1 humanoid robot yields successful traversal of a 0.2m–wide and 3m–long beam for 20 trials without any failure.

Framework

Stage 1 — Robust low-level tracking on flat ground.

We first train a low-level controller to realize template footsteps while staying stable under small, randomized goal perturbations (“disturbance-target training”). The policy uses only proprioception and gait phase and runs at a high rate to output joint targets. Demo setting: v_x = 0.5 m/s. During training, commands and targets were varied.

Stage 2 — Residual footstep planner in beam simulation.

A high-level planner refines the 3D LIPM/XCoM template with a small residual (Δx, Δy, Δψ) for the swing foot only. It is event-driven: queried at step transitions and held between events, using the same proprioception plus a compact elevation window from LiDAR. Demo conditions: v_x = 0.5 m/s beam = 0.20 m LiDAR 11 × 17 @ 0.1 m x: 0.1–1.1 m y: −0.8–0.8 m. This minimal representation matches hardware exactly.

Real-world deployment — Unitree G1.

We deploy both policies asynchronously on the robot: the low-level tracking policy runs at 100 Hz and sends joint position targets; a joint PD controller tracks them at 1 kHz. The residual planner is event-driven, updating on step transitions with zero-order hold between events. Real-world conditions mirror Stage 2: v_x = 0.5 m/s, beam = 0.20 m, LiDAR 11 × 17 @ 0.1 m. This architecture yields reliable beam traversal, precise foot placements, and clean sim-to-real transfer without a heavy vision stack.

BibTeX

@misc{huang2025traversingnarrowpathtwostage, title={Traversing the Narrow Path: A Two-Stage Reinforcement Learning Framework for Humanoid Beam Walking}, author={TianChen Huang and Wei Gao and Runchen Xu and Shiwu Zhang}, year={2025}, eprint={2508.20661}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2508.20661}, }

Traversing Narrow Paths: A Two-Stage Reinforcement Learning Framework for Robust and Safe Humanoid Walking

Overview

Abstract

Framework

BibTeX