Podcasting Has a Search Problem That Software Could Fix

Podcasts are missing out on listeners because search engines can't reach into specific episodes. Voice recognition technology is ready to fix that.

Kevin Allison 2015 ©Erika Kapin
Risk! host Kevin Allison wants you to find his show’s stories. (Photo: Erika Kapin Photography)

Podcasting has a discovery problem.

Sign Up For Our Daily Newsletter

By clicking submit, you agree to our <a href="http://observermedia.com/terms">terms of service</a> and acknowledge we may use your information to send you emails, product samples, and promotions on this website and other properties. You can opt out anytime.

See all of our newsletters

Users looking for information posted in audio online can’t search it like they can search text. Kevin Allison, the host of the storytelling podcast, Risk!, knows it. He told The Observer in a phone call that he often feels like his podcast, which has a million downloads each month, has first person accounts that relate to major news topics (such as coming out, gender transition and surviving abuse), but there’s no good way for interested web-surfers to find those specific episodes.

“An instant of video is a still, a window into the action that you can drag through time at will. An instant of audio, on the other hand, is nothing.”

Most podcasts run on shoestring budgets and their staff often doesn’t have the bandwidth to post detailed show notes, let alone transcribe episodes. Voice recognition software could make podcasts search friendly by giving search engines something to index. Then, when you go looking for a hot topic, you’ll find podcasts that discuss it as readily as news stories and blog posts. The technology is just waiting for someone to take the software and build it into podcasters’ workflow.

But could software make useful transcripts?

“The short answer is yes you could do that,” Alex Rudnicky, a professor in the Computer Science Department at Carnegie Mellon University, said in a phone call with The Observer. He is part of the team behind the open source voice recognition software, CMU Sphinx. Prof. Rudnicky said that many voice recordings could yield a good enough transcript to enable Google (GOOGL) and bing to index their content. “If your error rate is not more than 30%, software will be able to find keywords in it,” Prof. Rudnicky said. “You don’t need super high accuracy to get the gyst of what you’re talking about.”

The transcript could get the interesting words that people search. For example, someone looking for guidance on starting an indoor seed garden in the Winter might not find Melinda Myers podcast on the topic, because, judging by the text, the page doesn’t have nearly as much information about the topic as the many blog posts out there, even though the audio file has loads of information.

Here’s how technology could enable podcasts to start localizing their ads.

In fact, Rudnicky said that’s just the beginning of what’s possible now. Software can also be used to identify, using emotional cues signalled by word choice, when episodes hit high points. A Soundcloud hosted episode could be tagged to indicate when conversations get heated or when participants exhibit delight or signs of excitement. Indications like that might help newcomers to make a quick sample of the best part of an episode in order to decide whether or not they want to listen to the whole thing.

Automatically tagging key moments would be one sort of skimming. The ability to scan text would be another. In an influential story on Digg, on why audio never goes viral, Stan Alcorn points out that audio has a skimming problem. He writes, “You can’t skim sound. An instant of video is a still, a window into the action that you can drag through time at will. An instant of audio, on the other hand, is nothing.” Skimming was invented by readers of text.

Speechmatics is a cloud based automatic transcription service, for example. It’s founder, Dr. A J Robinson, told The Observer via email that his system could deliver a quality transcript from an audio recording. Users upload audio to the service and download text; the service also has an API. Speechmatics did not immediately respond for comment about whether existing podcasts had used the service.

John Dumas pays for transcripts for his Entrepreneur On Fire podcast, which is rapidly closing in on 1000 episodes. He does it more for his followers than for search though, because he has many who are hearing impaired. He said it costs him about a dollar per minute to do so. Mr. Allison says that that price point is a bit too steep for his show now.

Dumas said that search for specific topics isn’t driving much traffic to his shows. “The majority of podcast listeners are really just podcast listeners,” he said, because he doesn’t find many people coming in to hear just one episide because it answers a question the listener has. This may just be because there is no good way to find answers in podcasts the way most of us go looking: by entering a seach into Google.

Mr. Allison agreed. He said, “The only people who know what’s going on on Risk! are people who listen to Risk! And there are topics that come up on the show that are relevant to things that are going on in the culture.”

It doesn’t have to stay that way.

Podcasting Has a Search Problem That Software Could Fix