Machine learning applications are seamlessly interwoven into everyday tasks—Alexa, Siri, and Cortana compile our grocery lists and answer questions. But sending information to and from the cloud is not an ideal situation, for privacy reasons, convenience, or energy efficiency. Diana Marculescu, a professor in the Department of Electrical and Computer Engineering, is discovering efficient and accurate ways to run machine learning applications on mobile devices instead of relying solely on the cloud.
Most machine learning applications require computing power, data storage, and run-time costs that don’t allow for easy computing on a mobile device. Marculescu studies the energy efficiency of neural networks, or a type of machine learning where the configuration consists of many densely interconnected nodes. A neural network learns to perform a function, such as classifying images, by identifying correlations within a set of training data.
“Many people are looking at ways to represent everything that happens in neural networks, in terms of computation, with fewer precision bits,” said Marculescu. “If you quantize the values, rather than use continuous values, you don’t lose a lot of precision or accuracy in the results, but save runtime, storage, and improve energy efficiency. Storage efficiency is a big deal because all of these applications require massive amounts of data.”
For example, a designer may want to implement an image classification system on a smart phone, but the system needs to satisfy a certain power or storage constraint, so you don’t need to constantly charge the phone. With the work from Marculescu’s group, the designer can determine what configuration should be used.
Most recently, Marculescu and the students in her group have developed a framework called Hyper Power that introduces energy efficiency and runtime as constraints in the design process for neural networks. The framework considers power efficiency to be just as important as accuracy.
To find the best design for machine learning systems, designers usually have to experiment before they get to a configuration that will yield the best results. But as they experiment, it’s unclear where they are in the process—how accurate are the results that the configuration will yield?
Are they moving in the right direction? To combat the uncertainty of designing these configurations, Marculescu’s team developed a way to put constraints on the configurations that are likely to be too slow or too power consuming, restricting search space and getting to a solution faster.
“With our framework, you don’t have to wait until you enumerate many configurations to find a solution,” said Marculescu. “Say I want to build a neural network for image classification for an autonomous vehicle application. The system has to run in real-time because you want to have certain things happen at a certain time and not later, for safety reasons. If you have those constraints, you want to eliminate any configuration that’s not going to fit them.”
The framework has great implications for moving machine learning applications from the cloud to a mobile device. For example, suppose a botanist in the wild wants information about a particular plant. Without being connected to the network and accessing the cloud, they could determine if it’s something edible or poisonous.
Running these applications could not only be faster, but also more private. If you run medical health applications on a local device, you can preserve the privacy of the individual. Likewise, when smart appliances make autonomous decisions about your home (whether to turn the temperature up or down, or what items are missing from your fridge, for example), that information should stay private.
“As much as we can make these a local decision rather than having to rely on the cloud, it’s a good thing,” said Marculescu.