Speech recognition challenge

DigitalDirect · February 4, 2010, 10:26pm

We’re looking for GPGPU based speech recognition technology for an unusual application.

One of our customers has an audio archive comprising almost 50,000 hours of material, almost all of which is from one speaker, over a 40+ year period.

The speaker uses a rather limited Englsih vocabulary, around 1200 words (according to one hour-long sample) along with roughly 100 or so specialized non-English words.

The goal is to generate transcripts which are close enough that a human editor, familar with the specialized vocabulary and speech of the lecturer, can “touch up” the transcripts in, say, 30% of real time.

There are two uses of the transcripts: firstly, for closed captions, and secondly, as input for subsequent semantic analysis and abstraction of topics.

The audio quality, from the human perspective, is quite good, but the recordings are made in a variety of venues, using different microphones and so forth. Most of the lectures were given in small to medium-sized rooms, with perhaps 50-200 persons in the audience. Fortunately, very little was recorded outdoors.

Presumably the technology allows creation of custom acoustic, speech and language modeling, which are all optimized for this one speaker. The inputs, so to speak, at the audio files, and manually-prepared, accurate transcripts as training materials.

We’d be interested in hearing from researchers active in this field, and exploring development of what’s required to accomplish this.

cbuchner1 · February 4, 2010, 11:30pm

I don’t think CUDA is strictly required for this job.

A lot of CPU based voice recognition toolkits are doing pretty well and have been on the market for >10 years, being constantly improved over time. A couple of commercial solutions are available and you should be able to train such software for this particular speaker and his particular vocabulary.

However if such a training sequence requires the speaker to repeat particular phrases, then you might have a problem - assuming that this person is no longer available. ;)

DigitalDirect · February 4, 2010, 11:48pm

The problem is that commercial applications, at least at the consumer level, are very sensitive to the speaker, acoustical background, microphone and so forth. And they aren’t fast enough to work at large scale.

Exactly. The “input” materials are the audio files, and the transcriptions, that’s it. We can’t go back to the original speaker.

S.Warris · February 5, 2010, 12:24pm

You could ask these guys:
[url=“http://www.rug.nl/ai/onderzoek/onderzoeksprogrammaas/LanguageSoundCognition”]http://www.rug.nl/ai/onderzoek/onderzoeksp...eSoundCognition[/url]
They have a noise-robust and reasonably fast speech recognition system (used in a commercial setting by www.soundintel.com).
It is not GPU-based, although their system could be recoded for GPGPU. I know this, because I’ve helped in developing their early code based ;-)

I’m also an alumnus here ;-)

DigitalDirect · February 5, 2010, 7:20pm

Who’s the best person to contact there? Sounds like a good resource.

S.Warris · February 8, 2010, 10:11am

I think Tjeerd Andringa is the key member here. He is the program director and a researcher in this field for many years:

http://www.rug.nl/staff/t.c.andringa/index

And with a CC to Professor Dr. Schomaker:

http://www.rug.nl/staff/l.r.b.schomaker/index

Topic		Replies	Views
CUDA Spotlight: GPU-Accelerated Speech Recognition Technical Blog	0	285	August 25, 2020
Deep Speech: Accurate Speech Recognition with GPU-Accelerated Deep Learning Technical Blog	6	577	February 22, 2016
Audio and mulitmedia? Question on cuda CUDA Programming and Performance	3	2885	January 12, 2009
Unlikely or totally off-beat use of CUDA? CUDA Programming and Performance	3	2864	June 20, 2008
CUDA Natural language processing CUDA Programming and Performance	1	1764	November 17, 2017
Using GPU to accelerate Speech Encoders/Decoders CUDA Programming and Performance	1	2033	June 3, 2017
Watson on CUDA? Could a smaller version of Watson be written for CUDA CUDA Programming and Performance	1	2701	June 21, 2011
RealTime DSP w/CUDA on Linux Product Selection Advice CUDA Programming and Performance	1	3342	March 16, 2008
GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started Technical Blog	7	861	March 6, 2021
Audio codec implementation using CUDA CUDA Programming and Performance	0	843	April 25, 2013

Speech recognition challenge

Related topics