|
With the barrage of articles,
conferences, white papers and Webinars that
heralded, analyzed and explained the new Federal
Rules of Civil Procedure dealing with
electronically stored information, or ESI, by
now every lawyer and litigation support
professional should know that all forms of
electronic information are discoverable.
However, relatively few professionals have had
experience handling large volumes of digital
voice recordings — possibly because counsel are
not aware of options for processing such data,
and possibly because of misperceptions about the
costs or technical challenges of such
processing. This article provides an overview of
processing voice files and describes how
technology from Nexidia can be used to review
and produce such files.
As a preliminary observation,
voice recordings are created by a number of
business systems including:
-
Cell phone and company phone
and voicemail systems
-
Support call centers
maintained by software and other companies
-
Online Web conferences or
presentations
-
Transactional or trading
information
Most voice systems will have
discrete files for each conversation or call and
many of the file formats will be proprietary;
e.g., Dictaphone, Nice, Verint and many others.
The size of the files can vary considerably
depending on how many bits per second were used
to record the voice and what codec (compression
and decompression) techniques were used by the
system.
Voice recordings can be central
to the issues in the case, for example:
-
Did a customer actually place
the order in question?
-
Were messages sexually
harassing?
-
Had the manufacturer been put
on notice by earlier customer complaints?
-
Did a company’s employees
discuss pricing with its competitors?
Perhaps because of the
spontaneous and informal manner in which voice
recordings are made, these recordings can be the
most candid representation of what the
participants were really thinking at the time.
Furthermore, the tone and inflection with which
words are actually spoken can provide a clear
indication of the meaning intended by the spoken
words; there can be much less ambiguity with
recorded spoken words than with textual
representation of words. Witnesses may have a
lot harder time explaining their recorded voice
than their written words.
The basic steps used for voice
files are the same as those for other forms of
ESI: identify potentially responsive files,
preserve, gather, review and produce them.
Most voice file systems provide
metadata that can be used to conduct the initial
identification, preservation and gathering of
potentially relevant files. Such metadata could
include date, time, employee’s name, phone
number used to call in, and transaction ID.
Using metadata as selection criteria can
dramatically reduce the volume of information
that has to be processed.
Once the preliminary selection
has been made the real challenge lies ahead: how
to select just those messages that are relevant
and screen them for privilege. There are four
methods that can be used, each with a different
set of costs, processing characteristics and
benefits:
-
Listen to each recording.
This will take some multiple of the number
of hours of conversation that were recorded
as the listeners will have to take breaks,
pause, rewind, replay, etc. There is
anecdotal evidence that the attention of
listeners drops off over time and that this
method does not yield 100 percent accuracy.
-
Transcribe each recording and
then search the resulting text. This will be
time-consuming and expensive, and errors can
be introduced by the reporter not correctly
hearing what was said or not transcribing
what he or she heard.
-
Programmatically convert
voice to text and then search the resulting
text. This can have the advantage of being
less expensive than the first two methods,
but the computer time required to convert to
text can be appreciable and the quality of
the conversion process can vary widely with
the quality of the underlying voice file.
For example, a low bit rate recording of a
cell phone message may yield very poor
results.
-
Index the phonemes that occur
in the speech and then search for the
phonemes. Phonemes are the individual sounds
that speakers of a particular language use;
for example, the “k” sound of cool and keel.
North American English has 38, some
languages like Spanish or French have fewer,
some like UK English has more.
Nexidia uses its patented
technology to create an index of the phonemes
that were used in the voice files and then
enables the user to search for the phonemes that
are used in the terms the user enters in
queries. According to Nexidia, indexing which
phonemes occur in speech is more reliable than
trying to determine which complete words were
spoken. Nexidia originally developed its
technology for use in managing corporate call
centers where there was a need to audit or
review what a company’s employees were saying.
Nexidia claims several advantages
to its indexed phoneme approach. The first is
rapid processing — the voice files can be
converted to searchable indices at the rate of
176 hours of recordings per hour of CPU time
(that’s more than 8,000 hours of recordings in a
single day on a dual CPU computer) so that even
large collections can be indexed in relatively
short order. The second advantage is that the
searches are more accurate than just listening
to the recordings or converting voice to text
and then searching text. The third is that the
cost of the processing is much less than the
cost of listening or transcribing.
The Nexidia voice search
interface permits the user to use “and,” “or”
and “not” Boolean-type logic, and permits the
user to set required adjacency in seconds for
search terms. The result display is interesting
— each search term is assigned a color and the
user can tell by looking at the depiction of the
sound file how close the search terms occurred
to each other:

Reviewers can assign tags such as
“Responsive” or “Privileged” to files, and
Nexidia can output those files in the desired
format (typically WAV) along with the associated
metadata for production to other parties.
Nexidia offers its technology on
a services basis and but also licenses its
software for clients who would prefer to process
their own data and have the expertise to handle
the various sound file formats found within
their organization. Nexidia’s processing charges
are typically based on a startup charge, which
will include a certain number of hours of voice
recordings.
The technology is also available
through Nexdia partners such as Fios, National
Data Conversion and TechLaw Solutions.
For further information:
|