RoboTool enables creative tool use in—you guessed it—robots

by Kaitlyn Landram

Large language models developed in the Mechanical Engineering Department with GoogleDeepMind enable robots to “brainstorm” creative tool use and perform seemingly impossible tasks.

If an ingredient is out of reach on a high pantry shelf, it wouldn’t take you more than a few seconds to find a step stool, or maybe just a chair, to stand on to bring the ingredient within your reach. This simple solution is the outcome of a complex problem-solving approach researchers call creative tool use. While using tools for their intended use is a useful skill, creative tool use relies not only on the ability to identify tools and use them efficiently, but also on the propensity to predict outcomes, making it a hallmark of advanced intelligence that beyond humans, few animals have mastered.

Researchers in Carnegie Mellon University’s Department of Mechanical Engineering in collaboration with Google DeepMind have posed this question: Because humans understand creative tool use, can we teach robots how to make use of it too?

Images from research demonstrating robot performance — Quadrupedal robot and robotic arm explore six challenging tasks that represent three types of creative tool-use behaviors.

“The fundamental challenge of creative tool usage is that it is by definition an ‘unknown unknown’ problem, meaning no demonstration,” said Ding Zhao. “Otherwise, it is about learning, but not creation. Therefore, we must bring in external knowledge to help the robots to brainstorm. Large Language Models, which extract all knowledge from the internet, are the perfect ingredients in this scheme,” said Zhao, an associate professor of mechanical engineering and the director of CMU Safe AI Lab.

To explore this, researchers developed RoboTool, a creative tool-use system built on large language models (LLMs) that accept natural language instructions about a robot’s environment including the size and positioning of objects in its workspace, and other embodiment-related constraints. RoboTool then outputs directly executable Python code as a plan to complete the task.

“As opposed to existing models that provide robots with concrete directions, such as ‘use this fork to eat that cake,’ we only provide our robot with a high-level objective, like ‘eat the cake,'” explained Mengdi Xu, a Ph.D. candidate in mechanical engineering.

RoboTool was put to the test when the research team asked two different robots to each perform three tasks requiring tool selection, sequential tool use, and tool manufacturing.

Opposed to existing models that provide robots with concrete directions, such as ‘use this fork to eat that cake,’ we only provide our robot with a high-level objective like ‘eat the cake.’
Mengdi Xu, Ph.D. Candidate, Mechanical Engineering

Tool selection was assessed by tasking a robotic arm to grab a milk carton out of its reach, and asking a four-legged robot to move from one sofa to another, minding the gap in between. Both robots had to choose the most appropriate tool among multiple options to succeed. They demonstrated a broad understanding of object size and shape, as well as the ability to analyze the relationship between these properties and the ultimate objective to pass the test.

Sequential tool use required robots to use a series of tools in a specific order to reach the goal. The teams’ robots demonstrated this by moving blocks together to climb a sofa, and using a stick to push a can onto a piece of paper to then pull the can within reach.

Tool manufacturing called for the robots to accomplish tasks by crafting tools from available materials, like using a kickboard and a pipe to create a lever to lift a cube. This test requires the robot to discern implicit connections among objects and assemble components through manipulation.

“This capability is important in robotics because it enables robots to do tasks that originally seemed impossible,” said Peide Huang, a Ph.D. candidate in mechanical engineering.

Moving forward, the team will incorporate vision models into the system to unlock an even stronger perception and reasoning capability. They are also looking to develop more interactive ways for humans to participate in and guide robots’ creative tool use.