Bots Let ChatGPT Touch the Real World Thanks to Microsoft – Ars Technica

A drone flying over a city.


Last week, Microsoft researchers announced an experimental framework for controlling robots and drones using the language properties of ChatGPT, a popular AI language model created by OpenAI. Using natural language commands, ChatGPT can write special code that controls robot movements. A human then sees the results and adjusts as needed until the task is complete.

The research came in a paper titled “ChatGPT for robotics: Design Principles and Modeling Capabilities,” written by Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor of the Microsoft Autonomous Systems and Robotics Group.

In a demonstration video, Microsoft shows robots – apparently controlled by code written by ChatGPT while following human instructions – using a robotic arm to arrange blocks in a Microsoft logo, fly a drone to inspect the contents of a shelf, or find objects using of a robot with vision capabilities.

Microsoft’s “ChatGPT for Robotics” demonstration video.

To get ChatGPT to communicate with robotics, the researchers taught ChatGPT a custom robotics API. When given instructions like “fetch the ball,” ChatGPT can generate robotics control code just as it would write a poem or complete an essay. After a human has inspected and edited the code for accuracy and security, the human operator can perform the task and evaluate the performance.

In this way, ChatGPT accelerates robot control programming, but it is not an autonomous system. “We emphasize that the use of ChatGPT for robotics is not a fully automated process,” the paper says, “but serves as a tool to increase human capacity.”

A diagram provided by Microsoft explaining how ChatGPT for Robotics works.
Magnify / A diagram provided by Microsoft explaining how ChatGPT for Robotics works.


While it appears that most of the feedback to ChatGPT (in terms of the success or failure of its actions) comes from humans in the form of text, the researchers also claim to have had some success feeding visual data into ChatGPT itself. In one example, researchers tasked ChatGPT with commanding a robot to catch a basketball with feedback from a camera: “ChatGPT can estimate the appearance of the ball and the sky in the camera image using SVG code. This behavior suggests a possibility that LLM holds govern an implicit world model that goes beyond text-based probabilities.”

Although the results seem rudimentary for now, they represent early attempts to use the hottest technology du jour – large language models – for robot control. According to Microsoft, a ChatGPT interface could open up robotics to a much wider audience in the future.

“Our goal with this research is to see if ChatGPT can think beyond text, and reason about the physical world to help with robotics tasks,” says a blog post from Microsoft Research. “We want to help people interact with robots more easily, without having to learn complex programming languages ​​or the details of robotic systems.”

Leave a Reply

Your email address will not be published. Required fields are marked *