> On Mon, 2 Mar 1998, Don Colton wrote:
>
> >I see postings with lines like this:
> >
> > > What signal to noise ratios do you get in the Audio Wizard?
> >
> > 34
>
> Does anyone know how NS computes SNR? How does it know without
> comparing the signal coming from the mic against another signal
> that there is noise? How does it calculate the distortion bins
> (whatever it is)?
I don't know how nat speak does it, but I have two algorithms I use for
calculation of speech SNR without seperate access to the speech and noise
levels. The easiest is to assume that the background noise changes it's
spectral characteristics fairly slowly, relative to speech. This is not a
bad assumption, since speech is produced by a bunch of devices (vocal
cords, tongue, lips, etc.) that can change position very fast, but your
office background noise (in general) is produced by devices that have
acoustic spectra that can only change slowly (computer fans, ventilation
systems, idle printers, flourescent lights) So a spectrum of the noise
taken just before the start of speech or just after the end of speech is
fairly typical of the noise that occurs during speech.
That gives you an estimate of the noise. To get an estimate of the
speech, simply assume that speech and noise are absolutly uncorrelated.
There are a few problems with this (the Lombard effect, where adding
noise to speech affects how we speak, and any noise that involves
vibrating the person talking modulates the speech (this is something you
can observe in automotive speech)) but in general, it gives pretty close
results. So, you can subtract the root mean speech square noise estimate
(from just before the speech) from the RMS speech+noise (measured during
speech) to get an approximation of speech without noise. This breaks down
at really severe SNR (we're lucky to get 10dB in a car) but works fine at
20dB or higher (office stuff).
Now, if you want to get a measurement that really lets you compare
microphones, you have to filter the speech into a bunch of frequency
bands (12 minimum) and compute SNR in each of these bands, then an
agragate SNR based on all the band limited rations. This lets you look at
the SNR as an adapted recognizer sees it, not just as one raw energy
level.
Comments from the Dragons?
> Judging from the fact that I get worse results with better
> mics, NS is picking up ambient noise in the room, like the
> TV in the other room, my breathing noises, hissing from the
> radiator, etc etc :-)
Do you junk your models and do a full retrain when you switch to the
"better" mics? Changing frequency response means you are essentially
a different person talking, and need to readapt the system.
Now, nat speak adjusts your mic gain based on your speech levels. If the
SNR is indeed higher, then you should be picking up less of the
radiators, TV, etc. when the gain is adjusted. Just a guess.
> Regards,
> Mark http://csam.montclair.edu/Faculty/Hubey.html
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> "Man will never reach the moon regardless of all future scientific advances."--- Dr. Lee De Forest, inventor of the vacuum tube and father of television.
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
Joseph S. Wisniewski | Views expressed are my own, and don't reflect
Ford Motor Company | those of the Ford Motor Co. or affiliates.
Project Sapphire | LeMans, Daytona, Bonneville, and Sebring are
jwisniew@@ford.com | just races, won by people driving Ford cars!
![]() |