A world where voice control is practically everywhere could be here sooner than you think, thanks to MIT

Seemingly
ready to kick-off the age of voice-controlled electronics, MIT researchers have built
a low-power chip specialized for automatic speech recognition. Compared to a
cellphone running speech-recognition software that might require about 1 watt of
power, this new chip requires between 0.2 and 10 milliwatts, depending on the
number of words it has to recognize. 

Voice_Control_Everywhere

This
could translate to a power savings of 90 to 99%, which could make
voice control practical for simple electronic devices. This includes power-constrained
devices that have to harvest energy from their environments or go month between
battery charges.

“Speech
input will become a natural interface for many wearable applications and
intelligent devices,” said Anantha Chandrakasan, the Vannevar Bush Professor of
Electrical Engineering and Computer Science at MIT, whose group developed the
chip. “The miniaturization of these devices will require a different interface
than touch or keyboard. It will be critical to embed the speech functionality
locally to save system energy consumption compared to performing this operation
in the cloud.”

According
to design leader and graduate student in electrical engineering and computer
science, Michael Price, the team didn’t develop the technology for any
particular application. “We have tried to put the infrastructure in place to
provide better trade-offs to a system designer than they would have had with
previous technology, whether it was software or hardware acceleration,” he
said.

Currently the topnotch speech recognizers are based on
neural networks. Much of the new chip’s circuitry is concerned with
implementing speech-recognition networks as efficiently as possible.

Of course, even the most power-efficient speech recognition
systems can quickly drain a device’s battery if ran without interruption. That’s
why the chip includes a simpler “voice activity detection” circuit that monitors
ambient noise to determine whether it might be speech. If so, the chip fires up
a larger, more complex speech-recognition circuit.

For experimental purposes, the researchers’ chip had three
difference voice-activity-detection circuits, each with different degrees of
complexity and power demands. Which circuit is most power efficient depends on context, but
in tests simulating a wide range of conditions, the most complex of the three
circuits led to the greatest power savings for the system. Although it consumed
almost three times as much power as the simplest circuit, it generated far
fewer false positives; the simpler circuits often chewed through their energy
savings by spuriously activating the rest of the chip.

Typically,
a neural network consists of thousands of processing “nodes” capable of simple
computations but densely connected to each other. In the type of network
commonly used for voice recognition, the nodes are arranged into layers. Voice
data are fed into the bottom layer of the network, whose nodes process and pass
them to the nodes of the next layer, whose nodes process and pass them to the
next layer, and so on. The output of the top layer indicates the probability
that the voice data represents a particular speech sound.

 

One
issue here is that a voice-recognition network is too big to fit in a chip’s
onboard memory. This is a problem because going off-chip for data is much more
energy intensive than retrieving it from local stores. To avoid this, the MIT
researchers’ design concentrates on minimizing the amount of data that the chip
has to retrieve from off-chip memory.

 

A node
in the middle of a neural network might receive data from a dozen other nodes
and transmit data to another dozen. Each of those two dozen connections has an
associated “weight,” a number that indicates how prominently data sent across
it should factor into the receiving node’s computations. The first step in
minimizing the new chip’s memory bandwidth is to compress the weights
associated with each node. The data are decompressed only after they’re brought
on-chip.

 

The team’s research was funded through the Qmulus
Project, a joint venture between MIT and Quanta Computer, and the chip was
prototyped through the Taiwan Semiconductor Manufacturing Company’s University
Shuttle Program.

 

Source: MIT