Skip to Main Content

Within the next decade, chances are your neighbors, or maybe even you, will be sharing a home with a robot roommate to lend a hand with everyday tasks. At Carnegie Mellon University, researchers in Ding Zhao’s Safe AI Lab are teaching robots to perform better by the day. The team has introduced a new machine learning framework that trains four-legged robot “dogs” how to perform versatile movements like pouring soda, organizing shoes, and even cleaning up cat litter.

Human2Locoman combines immersive teleoperation, large-scale data collection, and a modular learning framework to teach quadrupedal robots complex manipulation skills via human demonstrations and teleoperated robot movements. It bridges the “embodiment gap” between humans and robots and marks a significant advancement in quadrupedal manipulation learning.

Human2Locoman is powerful enough to help robots handle situations and objects they have never seen before.

Ding Zhao, Associate Professor, Mechanical Engineering

“Human2Locoman is powerful enough to help robots handle situations and objects they have never seen before,” explains Zhao, associate professor of mechanical engineering. “This new pretraining approach particularly helps in tasks that require precise localization and smooth motions.”

While LocoMan, a versatile quadrupedal platform from Zhao’s lab has previously demonstrated its capabilities in tasks like opening drawers, Human2LocoMan pushes the boundary by enabling the robot to acquire versatile, generalizable, real-world autonomous skills through cross-embodiment learning.

\

The Human2LocoMan system uses an XR headset and stereo cameras to capture both human and robot manipulation motion. For human data collection, a person wears the headset and performs tasks naturally. For teleoperation, the system maps human hand motions to the robot’s grippers and head motions to the robot’s torso, effectively expanding the robot’s workspace and enhancing its active sensing. To effectively leverage this cross-embodiment data, the team developed the Modularized Cross-embodiment Transformer (MXT), a learning architecture designed to efficiently transfer knowledge between humans and robots. By pretraining MXT on human demonstrations and fine tuning it with robot data, the system enables LocoMan to perform complex manipulation tasks like cat litter scooping—even in previously unseen scenarios.

“Human2LocoMan shows that human data can unlock scalable, generalizable learning for robots with very different embodiments,” said Yaru Niu, the lead author of this research and a Ph.D. candidate in the Department of Mechanical Engineering.

This research was presented at Robotics: Science and Systems (RSS) 2025.