SRS Logo
866-778-0524 Check your Shopping Cart!

Is there a "Best" Microphone for Speech Recognition Software?

Some Thoughts on Objectively Evaluating Microphone Accuracy

Jon W. Wahrenberger

Although the qualities of a specific microphone used for speech recognition are often touted by microphone dealers and users alike - often with strongly stated opinions - it appears that few objective criteria are actually employed in such evaluations. Nuance, for instance, provides microphone compatibility recommendations and highly rates the $25 dollar Andrea NC-91, while other more expensive microphones - including those with active noise canceling systems - are rated less highly. Yet Nuance does not publish or describe the methods or criteria by which they evaluate microphones. At the same time most microphone dealers tout the Sennheiser ME3 as the best and sell it for $160 or more without providing objective evidence of its superiority. Is there a significant difference among microphones? If so, how can this difference be quantified? Does the difference in microphone quality, if present, justify the huge price differential?

Many factors influence the usefulness and accuracy of a microphone for a given user and may prevent the reliable extrapolation of the results I one user to other users. Some of these factors include:

  1. Variability in microphone performance from one user to another by virtue of differing tone, pitch and volume.
  2. Differing techniques from one user to another, including microphone positioning, dictation technique, and varying levels of external noise contamination.
  3. Differing interactions between microphones and computer hardware, including soundcards and external sound adapters. One microphone may not work as well when paired with some systems compared with others.
  4. Bias resulting from an "expectation" of a new microphone to be superior. Could the expectation of a new microphone being superior cause a user to dictate or enunciate more carefully in order to make these expectations true?
  5. Physical preferences (or barriers) that make one microphone more comfortable or logistically practical for one user compared with another.

During the last few decades great strides have occurred in study design and statistical methods used to evaluate pharmaceuticals, devices and other interventions in medical care. The goal of such methods is to better is better identify true effect and reduce the likelihood of three common causes of erroneous conclusions: chance, bias and confounding.

  • " Chance " refers to the natural variability which occurs in events and possibility that a given result may simply be the result of random variation.
  • " Bias " refers to the process in which our expectations may unintentionally alter our study design, results assessment, and study analysis to favor our expected outcome.
  • " Confounding " refers to the mixing of effects. As an example of confounding, if we tested "microphone A" in one group of users and "microphone B" in a separate group and didn't realize that the "Microphone B" group had far more users with speech recognition experience, the finding of better accuracy in the "B" microphone might be attributed to the microphone but could actually result from the more sophisticated dictation style in these users.

How can the qualities we seek in a microphone used with speech recognition software be honestly and accurately evaluated? I would propose that an honest appraisal of a microphone quality will occur only if we apply the same vigor to the evaluation that we bring to the task of developing new drugs. Specifically we should evaluate a microphone and compare it to other microphones with the following in mind:

  1. The microphone must be evaluated in such a manner as to reduce the likelihood of chance, bias and confounding to affect the results (discussed above).
  2. The microphone should be evaluated in several user types, including those with high, medium and low pitched voices, and perhaps in users of varying voice "volumes" or intensity.
  3. The microphone should be evaluated in conditions of varying (but reproducible) external noise contamination circumstances.
  4. Microphones should not be evaluated in isolation, but rather in direct head-to-head comparison with other microphones.
  5. Results of microphone evaluation should be analyzed using modern statistical methods

The actual methods by which a microphone is objectively evaluated and compared could take several forms. The simplest method in some regards, but with significant technical challenges nonetheless, would be to create an artificial human "model" of a human user to which a microphone is appropriately attached and from which a standardized text such as the "rainbow passage" is read. Such a model might take the form of a small speaker which can reasonably approach the actual tone, pitch and quality of a human voice sound, and which would play pre-recorded dictation from experienced users of varying sound pitches. In such a system, the software would need to be set-up with a sound level calibration (audio set-up wizard) using the actual voice to be evaluated. The advantage of such a system would be to completely remove human variability from the process. The same voice system could be repeated several times with a given microphone and then used with another microphone. Issues of voice fatigue, user variation, user expectations, etc. would be completely removed and accuracy could subsequently be evaluated using standard statistical methods. The disadvantage of such a system, obviously, is the difficulty (impossibility?) of creating a system which accurately emulates human speech such that it would truly reflect the results of the same user actually utilizing the microphone.

 

A second and perhaps less technically challenging (but more logistically complicated) method would be to test microphones with human subjects, but in such a manner that variation in technique and user expectation is excluded or accounted for, and with all results evaluated rigorously using modern statistical methods. In order to do it this way, it would be necessary to "blind" the user as to the type of microphone being used (this could be accomplished by someone else attaching the microphone so the user does not see it; to account for microphones that attach differently, it might even be plausible for two microphones to be attached at once, i.e. a Sennheiser ME3 and other standard headset microphone with the user not knowing which microphone is plugged in and actually being evaluated.) The key to such a method would be to have multiple "runs" with each microphone with a standardized text being read in each case, with accuracy results meaned and evaluated using standard statistical methods.

With either system, it would also be appropriate to evaluate the extent to which contaminating noise will alter results in order to asses the "noise canceling" effects of the microphone. In order for this to be fairly assessed, the noise contamination must be standardized and reproducible. Ideally such "contaminating" noise would arise from a speaker placed a certain distance in front of the model and emitting a specific audio signal of a reproducible decibel level. Whether this should be music, human conversation, more random or assorted types of sounds (or a combination of all), would need to be considered.

Will anyone go to the trouble of devising a system to objectively evaluate microphones used with speech recognition software? It would seem that if the process is approached rigorously, the results would be truly beneficial to users of speech recognition software who struggle at the same time to optimize their accuracy while evaluating the potential marketing and "hype" from microphone dealers. Shy of such an evaluation system being developed, it seems we should take the results of microphone recommendations with some degree of skepticism. Most of us have tried several microphones and have our favorite. The maxim that "there is no one best microphone for everyone" is likely quite true!

 


Our Company   Our Industries   Payment   Trust   Share
Home
About Us
Contact us
Payment Methods
Privacy Policy
Returns
Security
Shipping
  General Users
Law Enforcement
Legal
Medical
Special Education
  Payment Methods
Purchase orders are accepted from approved businesses
 
 
Facebook
Blog
On-line Forum
Twitter
Email
Call
 
Contact us for assistance in product selection, set-up, or troubleshooting by calling 866-778-0524 weekdays 10 am-2 pm eastern time or email us at support@speechrecsolutions.com.
 
Copyright 2022, Speech Recognition Solutions, LLC. All rights reserved.