Robots that can learn to safely navigate warehouses

by Lynn Shea

Advances in chips, sensors, and AI algorithms are enabling robots to continuously learn how to plan routes, avoid obstructions, and operate safely in large dynamic warehouse environments.

Robots have been working in factories for many years. But given the complex and diverse tasks they perform, as well as related safety concerns, most of them operate inside cages or behind safety glass to limit or prevent interaction with humans.

In warehouse operations, where goods are continuously sorted and moved, robots can be neither caged nor stationary. And while large corporations like Amazon have already incorporated robots into their warehouses, they are highly-customized and costly systems where robots are designed to work within one particular facility on predefined grids or well-defined pathways under the guidance of specific centralized programming that carefully directs their activity.

“For robots to be most useful in a warehouse, they will need to be smart enough to deploy in any facility easily and quickly; able to train themselves to navigate in new dynamic environments; and most importantly, be able to safely work with humans, as well as sizeable fleets of other robots,” said Ding Zhao, the principal investigator and assistant professor of mechanical engineering.

Warehouse robots need to be smart enough to deploy quickly and navigate safely in new dynamic environments.
Ding Zhao, Assistant Professor, Mechanical Engineering

At Carnegie Mellon University, a team of engineers and computer scientists have employed their expertise in advanced manufacturing, robotics, and artificial intelligence to develop the warehouse robots of the future.

The collaboration was formed at the university’s Manufacturing Future’s Institute (MFI), which funds such research with grants from the Richard King Mellon Foundation. The foundation made a lead $20 million grant in 2016 and gave an additional $30 million in May 2021 to support advanced manufacturing research and development at MFI.

Zhao and Martial Hebert, the dean of the School of Computer Science and a professor at the Robotics Institute, are leading the warehouse robot project. They have investigated multiple reinforcement learning techniques that have shown measurable improvements over previous methods in simulated motion planning experiments. The software used in their test robot has also performed well in path planning experiments at Mill 19, MFI’s collaborative workspace for advanced manufacturing.

“Thanks to the advance in chips, sensors, and advanced AI algorithms, we are at the cusp of revolutionizing the manufacturing robots,” said Zhao. The team leverages previous work in self-driving cars to the development of warehouse robots that can learn multi-task path planning via safe reinforced learning, training robots to quickly adapt to new environments and operate safely with workers and human-operated vehicles.

MAPPER: Robots that can learn to plan their own pathways

The group first developed a method that could enable robots to continuously learn to plan routes in large dynamic environments. The Multi-Agent Path Planning with Evolutionary Reinforcement (MAPPER) learning method will allow the robots to explore by themselves and learn by trial and error in a manner similar to the way human babies accumulate more experience to handle various situations over time.

The decentralized method eliminates the need to program the robots from a powerful central command computer. Instead, the robots make independent decisions based on their own local observations. The robots’ partially observable capabilities will enable their onboard sensors to observe dynamic obstacles within a 10–30-meter range. But with reinforced learning, robots will continually, if not indefinitely, train themselves how to handle unknown dynamic obstacles.

Group of people outside staring at a robot — **Source:** College of Engineering

In November 2021, Ding Zhao and his students demonstrated their warehouse robot’s capabilities to Pennsylvania senators Ryan Aument, Joe Pittman, and Pat Stefano, and Representatives Josh Kail and Natalie Mihalek, who were touring Carnegie Mellon College of Engineering.

Such smart robots can enable warehouses to employ large fleets of robots more easily and quickly. Because the computation is done with each robot’s onboard resources, the computation complexity will increase mildly as the robot number increases, which will make it easier to add, remove, or replace the robots.

Energy consumption could also be reduced when robots travel shorter distances because they are able to independently learn to plan their own efficient paths. And the “decentralized and partially observable” setting will reduce the communication and computation energy when compared to classical centralized methods.

RCE: Robots that prioritize safety while in pursuit of programmed goal

Another successful study applied the use of a constrained model-based reinforcement learning with Robust Cross-Entropy (RCE) method.

Researchers must explicitly consider safety constraints for a learned robot so that it does not sacrifice safety in order to finish tasks. For example, the robot needs to avoid colliding with other robots, damaging goods or interfering with equipment in order to reach its goal.

“Although reinforcement learning methods have achieved great success in virtual applications, such as computer games, there are still a number of difficulties in applying them to real world robotic applications. Among them, safety is premium,” said Zhao.

Creating such safety constraints that are factored at all times and in all conditions, goes beyond traditional reinforcement learning methods into the increasingly important area of safe reinforcement learning, which is essential to deploying such new technologies.

Two students in a lab — **Source:** College of Engineering

Mengdi Xu, a third year Ph.D. student at the CMU Safe AI Lab, works with an intelligent manufacturing manipulation robot.

The team evaluated their new RCE method in the Safety Gym, a set of virtual environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training. The results showed that their approach enabled the robot to learn to complete its tasks with a much smaller number of constraint violations than state-of-the-art baselines. Additionally, they were able to achieve several orders of magnitude better sample efficiency when compared with constrained model-free RL approaches.

CASRL: Robots that can learn to adapt to current conditions

To further address how robots can navigate safely in typical warehouse environments where people and other robots are moving freely—or what researchers call non-stationary disturbances, the group employed the use of a Context-Aware Safe Reinforcement Learning (CASRL) method, a meta-learning framework in which the robots can learn how to safely adapt to non-stationary disturbances as they occur.

In addition to workers or other robots moving around a warehouse, the CARSL method would also enable the robots to learn how to safely navigate other situations that could include inaccurate sensor measurements, broken robot parts, or obstructions such as trash or other obstacles in the environment. The team also applies CARSL to manipulation of tools and interaction with human, which can be directly applied to assembly in manufacturing.

“Non-stationary disturbances are everywhere in real-world applications, providing infinite variations of scenarios. An intelligent robot should be able to generalize to unseen cases rather than just memorize the examples provided by humans. This is one of the ultimate challenges in trustworthy AI.” said Zuxin Liu, a third year Ph.D. student at the Safe AI Lab at CMU, supported by the MFI award.

Zhao explains that the robot must learn to determine whether the previously trained planning policies are still suitable for the current situation. The robot updates policy based on the recent local observations in an online training fashion, so that it could be easily adapted to new situations with unseen disturbances, while also guaranteeing safety with high degree of probability. Given the past several minutes/seconds sensing data, the robot can automatically infer and model the potential disturbances based on the data and update the planning policy. Zhao’s team further extends the method to task-agnostic online enforcement learning, which can continuously learn to solve unseen tasks with online reinforcement learning that not only is able to adapt to unseen, yet similar tasks, but also to identify and learn to solve distinct tasks.

In each of the above studies, the new models and methods improved upon prior ways of training robots to move about safely and effectively in new and changing environments. Such successful incremental steps are essential to achieving the ultimate goal of verifiable level of trustworthiness required for better warehouse robots.

The team will continue working on the deployment of manufacturing logistics and assembly manipulation. Zhao will also work on a new project funded by MFI on generating safety and security-critical digital twins/metaverse, which will be a critical tool in the development of trustworthy intelligent manufacturing robots safely and efficiently.

“The future of the next generation of manufacturing is now,” said Zhao.

Pictured top: Zuxin Liu, a third year Ph.D. student at the CMU Safe AI Lab, operates the intelligent manufacturing logistic robot.

Zhao’s robotic research was also funded by the Pennsylvania Infrastructure Technology Alliance.