One of the most popular tropes in sci-fi—be it James Bond, Knightrider, what have you—is the intelligent car sidekick. It can drive itself, see its surroundings, carry on conversations. With advances in autonomous vehicle and sensor technologies, engineers are every day bringing us closer to this science fiction reality. But when it comes to certain elements, such as intelligent navigation and conversation, the real world has fallen woefully behind. Until, that is, CMU-SV professor Ian Lane created Capio.
“We’re taking this technology to a level that mimics human capacity,” says Lane, “enabling people to interact with a machine in the same way they would interact with a human.”
Outwardly, Capio doesn’t look like much—a simple black bar mounted on the car’s dash, equipped with a camera and audio sensors. But despite its humble appearance, Lane’s technology solves all of the problems of former systems. Using computer vision-based approaches, the system can track the movements and gestures of every face in the car. This way, the car can follow the conversation in the same way a human can, telling the difference between when the passengers are talking to each other, and when they’re addressing Capio directly.
We’re taking this technology to a level that mimics human capacity enabling people to interact with a machine in the same way they would interact with a human.Ian Lane, Assistant Research Professor, Carnegie Mellon University Silicon Valley
Typically, in order to access GPS directions or find nearby restaurants, drivers rely on their smartphones to tell them where to go. But this can present a number of problems. Drivers looking down or typing on their phones while driving is a serious safety concern, and one of the leading causes of fatal accidents in the U.S. Some newer vehicles have voice-activated GPS systems installed directly into the car, but if there is any background noise in the car, these voice-activated systems break down.
“Though these systems have improved dramatically over the last few years—Siri and Alexa, for instance—there are still challenges,” says Lane. “Often, these systems fail when there are many people speaking at the same time. For instance, if you’re driving in your car and the kids are screaming in the back seat, the system doesn’t work. We’ve developed technology that understands individual speakers, so even if there are three or four people speaking at the same time, the system can pick out one person from that speech and recognize them with high accuracy.”
Second, not only can the system follow conversation like a human does, it can also learn in the same way as a human. Using deep learning systems, Capio is able to build upon its ability to pick out individual voices in order to get better and more accurate through its interactions with people over time—the same way that children learn to pick their parents’ voices out of a crowd.
The future of contextually aware, human-computer interaction systems will mean that eventually, every interaction we have with the machines...will feel just as comfortable as speaking to a person.Ian Lane, Assistant Research Professor, Carnegie Mellon University Silicon Valley
This hands-free, contextually aware, human-computer interaction system not only solves the problem of drivers using their phones while on the road, but combined with its onboard computer vision technology, it allows users to use their hands in a variety of more helpful ways. If you’re driving down the road and see a restaurant that looks interesting, all you have to do is point to it and ask Capio if it’s any good. Using GPS data and full internet connection, Capio can pinpoint the exact restaurant you are referring to and pull up its online reviews, then give recommendations for similar restaurants nearby that you might like as much, or more.
“These car systems are just the beginning for Capio,” Lane says. “The future of contextually aware, human-computer interaction systems will mean that eventually, every interaction we have with the machines we encounter on a daily basis will feel just as comfortable as speaking to a person.”