If you've ever fantasized about robots walking like humans, grabbing objects, or even helping you tidy up your room, Nvidia's latest Al model, "Cosmos," might make your dream come true. This series of foundational All models, known as "Cosmos," has been trained over 20 million hours on "human walking, hand movements, and object manipulation," making it essentially a "world exploration guide" tailor-made for robots.
Cosmos acts like an experienced "mentor," teaching robots how to understand and interact with the physical world. Unlike language models that learn to generate text by reading books and social media posts, Cosmos is designed to generate images and 3D models of the physical world. Its goal is to enable robots to "see" and "understand" their surroundings, so they can better perform tasks. At the CES conference in Las Vegas, Nvidia's CEO demonstrated Cosmos' capabilities. He played a video simulating warehouse activities, showing how Cosmos generates realistic scenes, such as boxes falling off shelves. These scenes can not only be used to train robots to recognize accidents but also help them learn how to handle unexpected situations.
The training process for Cosmos is like a "marathon." By analyzing 20 million hours of real-life footage, it learned how humans walk, move their hands, and manipulate objects. The footage covers a wide range of everyday scenarios, from simple grabbing actions to complex walking paths, and Cosmos applies this learning. This training method has made Cosmos the "all-around coach" for robots. Whether in industrial factories or service robots in homes, Cosmos helps robots learn more complex skills. For instance, Cosmos can generate a video showing a robot taking a box off a shelf, then allow the robot to repeatedly watch and mimic the actions until it can perform the task proficiently.
Nadia also launched a robot simulation platform called Isaac, designed to help robots learn new tasks more efficiently. This platform functions like a "virtual training camp." where robots can undergo extensive simulated training without the risks of the real world. The new features of the Isaac platform allow robot manufacturers to generate large amounts of synthetic training data from just a few examples. For instance, if you want a robot to learn how to grab a specific object, you only need to provide a few examples of the action, and Isaac can generate thousands of simulated scenarios for the robot to practice in a virtual environment. This "less is more" training method not only saves time and costs but also significantly improves the robot's learning efficiency.
Cosmos has a wide range of applications, from factories to homes, and is virtually everywhere. In factories, Cosmos can help industrial robots learn how to perform assembly tasks more efficiently, even identifying and handling anomalies on production lines. For example, when a machine malfunctions, Cosmos can generate a simulation video teaching the robot how to quickly locate the problem and fix it. At home, Cosmos can assist service robots in learning how to tidy rooms, grab objects, and even care for the elderly and children. Imagine that in the future, household robots not only help you clean but also take care of children when you're busy and even bring you tea when YUle sick. All of this relies on the "careful guidance" of Cosmos.
However, a robot's execution capabilities are limited by hardware. Even if Cosmos can generate perfect simulation scenarios, the robot must be able to execute them flawlessly. If a robot's mechanical arm is not flexible enough or its sensors are not sensitive enough, it will still fail to perform the task accurately. Nvidia's Cosmos model opens up a new path for robot development, allowing them to better integrate into our lives. Perhaps, shortly, robots will not only be our assistants but also our friends.
(Writer:Juliy)