>On Mon, 2 Mar 1998, H. M. Hubey wrote:
>
>levels. The easiest is to assume that the background noise changes it's
>spectral characteristics fairly slowly, relative to speech. This is not a
>bad assumption, since speech is produced by a bunch of devices (vocal
>cords, tongue, lips, etc.) that can change position very fast, but your
>office background noise (in general) is produced by devices that have
>acoustic spectra that can only change slowly (computer fans, ventilation
>systems, idle printers, flourescent lights) So a spectrum of the noise
>taken just before the start of speech or just after the end of speech is
>fairly typical of the noise that occurs during speech.
Ok. That makes sense. Then it is or should be something that has a
flat/constant spectrum (i.e. white noise)?
As I speak I watch the power spectrum and it seems to get
constant energy at all frequencies which is strange since speech
is supposed to be concentrated at the lower frequencies. I tried
changing the pitch of my voice to see if it had any effect but
I could not see it on the spectrum. For some reason the noise
level increases faster than speech if I increase the volume
controls for speech. The only noise in the room is the fan
or the speakers. Maybe it is picking up something thru the
cable from the monitor or from the computer itself. Shouldn't
the mic cables be shielded or maybe constructed coaxially?
>bands (12 minimum) and compute SNR in each of these bands, then an
>agragate SNR based on all the band limited rations. This lets you look at
>the SNR as an adapted recognizer sees it, not just as one raw energy
>level.
How does it calculate what it calls "distortion bins"? What is it?
How can it calculate what is being distorted?
>Do you junk your models and do a full retrain when you switch to the
>"better" mics? Changing frequency response means you are essentially
>a different person talking, and need to readapt the system.
I have not gotten to training it. I am still at the voice quality
check. I lost my previous speech files because of an HD crash and
I don't want to spend a lot of time re-training without being sure that
the voice quality is the best I can produce.
>Now, nat speak adjusts your mic gain based on your speech levels. If the
>SNR is indeed higher, then you should be picking up less of the
>radiators, TV, etc. when the gain is adjusted. Just a guess.
IT is still surprising to me how people can get such high SNR.
Regards,
Mark http://csam.montclair.edu/Faculty/Hubey.html
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
*The glory of mathematics is that we do not have to say what we are talking about. ---Richard Feynman
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
![]() |