Researchers at Signal Analysis Interpretation Lab at USC test out their drunk speech detector. Clockwise from top: Shri Narayanan, Daniel Bone and Matt Black.
Last October, Republican presidential candidate Rick Perry delivered a bizarre speech in New Hampshire during which he bounced on his heels, giggled, and struggled to stay on point. Many people thought that he was drunk, and the claims became so widespread that Perry had to publicly deny them. But short of going back in time and taking a blood sample, there was no way to be sure. That is, until a team of researchers from USC came along.
Professor Shri Narayanan and his team designed the winning entry in the latest challenge from the International Speech Communication Association (ISCA). The challenge asked participants to design a computer system that could identify drunk people from their speech patterns. The team's winning system could tell if someone was drunk about 70 percent of the time.
Narayanan said the topic isn't exactly what their lab is known for.
"It wasn't a core problem in our lab, that we are interested in the state of inebriation of people."
What they are interested in is speech. The researchers in Professor Narayanan’s lab have developed computer programs that dissect speech patterns and then tie them to hidden states, like emotions or diseases such as autism.
“Let’s consider the example of a child with autism," said Narayanan. "A prototypical way of characterizing that is to say that their intonation sounds atypical. We are interested in how to quantify this atypicality in intonation."
Many doctors currently use the way he or she talks as an indicator for autism. Narayanan thinks his computer programs will soon be able to do it better.
“[Computers] can look at micro-variations and intonations in the beginning of my sentence now and compare it with [my voice] three minutes later," he explained. "That’s difficult for humans to do, or maybe even impossible, but computers can build analytic tools to try to measure these things."
Professor Narayanan's laboratory focuses mostly on debilitating speech in order to help physicians better diagnose their patients. But the drunken detection challenge was no gimmick. Since there's a reliable way to measure drunkenness (blood-alcohol content) the challenge gave researchers a concrete way to see whether their systems were any good. So his team signed up.
The data they used came from researchers in Germany, who collected sober speech recordings from recruited subjects. Then they gave them a few beers and recorded their speech again.
Narayanan said they also measured their blood alcohol content to determine which ones were legitimately drunk.
The professor's six-man team included recent Ph.D. Matt Black and third-year doctoral candidate Daniel Bone, who showed Off-Ramp the system. Black queued up an example of a sober German utterance - a banal statement where the speaker asks his car to play a CD. He then played an utterance from the same speaker, with the same prompt, but after drinking lots of beer.
It didn't sound like the first.
"He adds an extra 'gut' at the end," said Bone, laughing. "I think that means 'good'".
Black said they started with the speaking rate. They thought that it would be an intuitive thing to measure because they expected that drunk speakers would speak slower. But the computers didn't think it was such a useful measurement.
"We found that the rate features weren’t very helpful," said Black, "but then we found other features that were less interpretable that ended up allowing us to win the competition.”
Black said that the most important thing was to build a system that was robust and could measure thousands of different features, regardless of whether the research team could hear them.
“How these happen sometimes is beyond human interpretability," said Narayanan. "And that’s what we refer to as black box. We really can’t interpret the internals of these algorithms and how they choose data, how they slice them and dice them, and it’s not often nicely packaged to listen to. But what we can see with certainty are outcomes.”
At a computer station in the back of the laboratory, the team was eager to finally show off its new system. Black put on a headset and sat in the pilot’s seat, where he began saying commands to familiarize the program with his voice.
He uttered three "sober" sentences, three "drunk" sentences and then a test sentence.
"Now it’s going through and training the system," said Narayanan. “It’s extracting spectral voice quality features, extracting prosodic features, intonation and so on, it’s now crunching, organizing it in order of importance.”
A few minutes later the results were in. The computer came up with a score of five out of seven, or about 70 percent. Narayanan said it was a pretty good score - about on par with human accuracy, in fact - and said it was time to try a clip from Rick Perry's speech.
They ran the audio and waited. The results appeared. Perry was sober.
Narayanan didn’t want to speculate whether the computer got that one wrong, but he said hopefully by the time the next presidential election rolls along, they'll be absolutely sure.