| The following article, originally written
in 1996, clearly indicates the presence of
4 top contenders for speech recognition in a physician's office.
Happily for most physicians in 2005, there are no longer 4 top
contenders - but rather one very clear choice -
Dragon NaturallySpeaking®
Medical V8.
However, I still find it interesting to review the
transition towards this point in time.
Most of the products listed in this article are no
longer in production.
One of them - Dragon Dictate, has been supplanted by Dragon
NaturallySpeaking.
Most of the others, unfortunately, were commercially unsuccessful,
and the companies which had produced them are no longer in business.
Voice Recognition Software in a Medical Office
Medical record keeping has improved significantly over the
past few decades. First, a few short words about the patient, dx
and rx scribbled on a blank sheet of paper. Then medical records
and patient notes dictated and transcribed on IBM Selectric
typewriters. In the 1980's, typewriter ribbons and white out gave
way to computers. And now Voice Activated Software.
In the 1990's voice activated software, a.k.a. speech
recognition, emerged as the vanguard in word processing
technology. The early pioneers in this field were poorly received
for a few reasons. First, the hardware on the market was not yet
up to the task. For instance, Kurzweil A.I. introduced VoiceRAD
when 386 processors were standard operating equipment but
insufficient to drive the software. The result was a barely
functional system which generated negative word-of-mouth about
voice recognition. Further, when face to face with the
technology, the physician was often mystified as to why he or she
was unable to produce reports with the ease and skill
demonstrated by the salesperson who seemed to master the
software. The answer is that the salesperson had a specific
script from which they worked. And the nature of the product is
that both the speed and recognition capability of voice
recognition software improve with use or "training". Of
course, in medicine, a bewildering variety of pathology is
"reported" or dictated on the voice recognition system
so the physician generally doesn't repeat the same words in
report after report. As a result, the physician needed to use
'discreet speech' whereby one must pause, as much as 1/5th
second, between words. You - had - to - speak - like - this.
In 1994, with the advent of the Pentium Processor and the
lowered cost of memory (RAM or Random Access Memory now around
$40 per megabyte ) the hardware was sufficient to drive voice
recognition software. And even better, a system that cost $35,000
in 1993 was priced at $15,000 in 1994, hardware included. Voice
software systems became technological breakthoughs that would
even pay for themselves by costing less than the annual salary of
a typical transcriptionist. The ease of installing the necessary
sound card (the voice hardware) was improving. The sophistication
of the database, the speed and quality of the recognition and the
lowered cost of hardware all led to a situation in which it was
almost beginning to make sense to strongly consider voice
recognition for certain specific offices. However, there is a
difference between 'almost beginning to make sense' and actually
being appropriate for the average office. In 1994 the field of
voice recognition was far away from actually being cost effective
for all but the most unusual medical facility.
120 megahertz clock speeds and Pentium processors appeared in
1995 and are considered the norm in hardware specifications in
1996. And of course, the recognition capacity of the voice
recognition systems have increased dramatically. Now that Pentium
processors offer clock speeds significantly over 160 megahertz
the pause in discreet speech is lowered to 1/10th second, or
possibly even less in most instances. In addition, some of the
programs offer continuous speech recognition for digits and other
small vocabulary situations.
Now not only are there 'speaker dependent' programs, but the
beginnings of usable 'speaker independent' programs. The
difference between the two are that 'speaker independent'
programs are usable right out of the box. As soon as you unpack
the program and install it into your computer you can start
speaking. Many of the speaker dependent programs require an
'enrollment period' during which you teach the computer your
specific 'voice profile.' This isn't such a bad idea, except that
it frequently takes up to 3 hours before you voice activate your
first meaningful page. That is quite a tall order to ask of a
busy physician who is looking for methods of streamlining his or
her office practice, not looking for new projects to undertake.
Voice activated software is now available in a wide variety of
packages, at many different price ranges. There are systems
available for as little $395, and of course one can spend many
thousand of dollars on a system as well. Many programs do not
require a specific sound card, but are compatible with the large
variety of cards that now appearing as standard equipment in the
modern computer system.
Most of the current systems are either speaker independent or
require only a minimum of training. They can be installed by the
computer neophyte on most pentiums. And while most computers are
being sold with more and more RAM, many of the voice systems are
requiring less of it. Typical hardware requirements are 16 to 32
megabytes of RAM to run the program, 60 megabytes of hard disk
space to store the program on the hard drive and a Pentium
processor.
Summary of Currently Available Systems - 1996
There are 4 good contenders for the physician who wishes to
take advantage of technology to streamline efficiency . In no
particular order they are:
Dragon
Kurzweil
IBM
Kolvox
It is not so much that one is better than the other, but
rather that they have different features, and one is more
appropriate for certain specific uses.
Kurzweil, one of the original developers and purveyors
of voice recognition has, by far, the most sophisticated 'Medical
Reporting' system available. The use of extensive 'triggers'
provides for very rapid report production once the user is quite
familiar with the system. These triggers are analogous to voice
macros in which speaking one or two words can produce an entire
line, paragraph or even page of text. More importantly, this text
will allow for the use of fill-ins within these lines or
paragraphs. Thus, as an example, in discussing a knee exam the
physician might wish to indicate that it was normal. However, if
the physician did not wish to be terse and merely say 'normal
knee' but rather wished to discuss each of the portions of the
physical exam, and indicate that each of them was normal, he or
she might say "knee exam - normal - left". This could
lead to a sentence such as " the patient's left knee was
examined and was seen to be within normal limits. There was no
evidence of ligamentous laxity nor evidence of meniscal injury.
There was no effusion present and the range of motion was
full."
Therefore, even though the dictator needs to speak in discreet
speech, with a brief pause between words, the finished product
can still be produced in less time that using more conventional
means.
Additionally, the specific words which are offered are
entirely under the control of the physician, and can be changed
'on the fly'. If there are multiple physicians in the office
using the same software, each physician may have different
standard wording for each portion of each examination.
There is a price to pay for this sophistication however. The
base cost of the Kurzweil VoiceMED systems, as these are called,
is $6,000 for the first user, with additional substantial fees
for each additional user. (Kurzweil is the only company which has
a per user price.) VoiceORTHO costs $8,000 for the first user.
There are additional fees for any associated hardware and
training. Also, there is a learning curve with this, as with most
new software. Because of the substantial pre-formatting of the
wording of hundreds of exams, the physician needs to learn to
anticipate what the computer is expecting in order to allow for
the most rapid report formation. However, most physicians
generally work within the same specialty. The results are that an
Orthopedic Surgeon doesn't have to learn the wording or
formatting for chest exams or eye exams. Therefore, once the
physician has learned their area, report generation and turn
around is faster, in fact immediate, and best of all free. A
sophisticated Kurzweil system can automatically bring up patient
demographics ,print prescriptions, automatically fax, and include
ICD-9 codes.
Dragon has a suite of excellent products called Dragon
Dictate. They were initially designed for the disability market,
allowing users to voice activate their computers with an absolute
minimum of keystrokes. This is quite convenient for quadriplegics
and others whose use of the hands is difficult or impossible.
More recently they have significantly broadened their market,
aiming at general office staff, transcriptionists, attorneys,
journalists and, of course, physicians. It has an excellent
underlying voice recognition engine, and is offered with prices
ranging from $395 (not recommended in any way for a busy
physician) to $1700. The addition of DragonMED medical
vocabulary, available for an additional $495, turns this into a
first class medical transcription device. It is certainly a very
reasonable alternative to the more expensive solutions. In fact,
with the top of the line software costing less than 1 month's
salary of a transcriptionist, it is an excellent money savings
method for today's medical practice.
IBM, obviously a leader in many computer areas has an
excellent product entitled IBM VoiceTYPE Dictation. It is by far
the most widely publicized, with national media advertising on
such programs as the Oscars and other very high profile
occasions. Price, including the medical vocabulary, is $1500.
Some degree of training or enrollment is necessary. However, it
does run using a PCMCIA card, thus allowing for its use on a
portable, or laptop computer.
Kolvox is the only company which does not produce its
own voice recognition engine or platform but rather embellishes
the underlying engine from either Kurzweil, Dragon or IBM. What
Kolvox does is to place on top of the underlying engine a series
of voice macros which will make the system more readily usable
right out of the box. They offer mailing list and faxing
conveniences as well as mail merge functions. These are more
useful for the legal profession, and in fact they do have a
product called LawTALK. However, their OfficTALK is a good
contender for the physician who wishes to have a more
sophisticated system than some of the bare bones systems, and is
willing to pay around $1500 for it.
So, where are we in May 1996. There are a number of excellent
alternatives. Handwriting continues to be one of them. Or you can
dictate into a hand held recorder and provide the tapes to a
transcription department, either in house, at $12 - $18 per hour,
or send them to an outside service that will charge by the line.
You will get them returned to you in a few hours, or days, and
then have the pleasure of proofreading them.
Or you can take the plunge and start using the technology
which will be so prevalent during the 21st Century: Voice
Recognition.
_____________________________________________________________________________________
Eric S. Fishman, M.D. is a practicing Orthopedic Surgeon in
West Palm Beach, Florida. After purchasing VoiceORTHO for his
practice in 1994 he and his wife founded 21st Century Eloquence,
a company which provides voice recognition software from all of
the major voice companies in the U.S, including IBM, Kolvox,
Dragon and Kurzweil. 21st Century Eloquence can be reached by
phone: 1-800-245-2133
|