Reviewing and Producing Digital Voice Files

By Joe Howie
 

Home
Publications

With the barrage of articles, conferences, white papers and Webinars that heralded, analyzed and explained the new Federal Rules of Civil Procedure dealing with electronically stored information, or ESI, by now every lawyer and litigation support professional should know that all forms of electronic information are discoverable. However, relatively few professionals have had experience handling large volumes of digital voice recordings — possibly because counsel are not aware of options for processing such data, and possibly because of misperceptions about the costs or technical challenges of such processing. This article provides an overview of processing voice files and describes how technology from Nexidia can be used to review and produce such files.

As a preliminary observation, voice recordings are created by a number of business systems including:

  • Cell phone and company phone and voicemail systems
  • Support call centers maintained by software and other companies
  • Online Web conferences or presentations
  • Transactional or trading information

Most voice systems will have discrete files for each conversation or call and many of the file formats will be proprietary; e.g., Dictaphone, Nice, Verint and many others. The size of the files can vary considerably depending on how many bits per second were used to record the voice and what codec (compression and decompression) techniques were used by the system.

Voice recordings can be central to the issues in the case, for example:

  • Did a customer actually place the order in question?
  • Were messages sexually harassing?
  • Had the manufacturer been put on notice by earlier customer complaints?
  • Did a company’s employees discuss pricing with its competitors?

Perhaps because of the spontaneous and informal manner in which voice recordings are made, these recordings can be the most candid representation of what the participants were really thinking at the time. Furthermore, the tone and inflection with which words are actually spoken can provide a clear indication of the meaning intended by the spoken words; there can be much less ambiguity with recorded spoken words than with textual representation of words. Witnesses may have a lot harder time explaining their recorded voice than their written words.

The basic steps used for voice files are the same as those for other forms of ESI: identify potentially responsive files, preserve, gather, review and produce them.

Most voice file systems provide metadata that can be used to conduct the initial identification, preservation and gathering of potentially relevant files. Such metadata could include date, time, employee’s name, phone number used to call in, and transaction ID. Using metadata as selection criteria can dramatically reduce the volume of information that has to be processed.

Once the preliminary selection has been made the real challenge lies ahead: how to select just those messages that are relevant and screen them for privilege. There are four methods that can be used, each with a different set of costs, processing characteristics and benefits:

  • Listen to each recording. This will take some multiple of the number of hours of conversation that were recorded as the listeners will have to take breaks, pause, rewind, replay, etc. There is anecdotal evidence that the attention of listeners drops off over time and that this method does not yield 100 percent accuracy.
  • Transcribe each recording and then search the resulting text. This will be time-consuming and expensive, and errors can be introduced by the reporter not correctly hearing what was said or not transcribing what he or she heard.
  • Programmatically convert voice to text and then search the resulting text. This can have the advantage of being less expensive than the first two methods, but the computer time required to convert to text can be appreciable and the quality of the conversion process can vary widely with the quality of the underlying voice file. For example, a low bit rate recording of a cell phone message may yield very poor results.
  • Index the phonemes that occur in the speech and then search for the phonemes. Phonemes are the individual sounds that speakers of a particular language use; for example, the “k” sound of cool and keel. North American English has 38, some languages like Spanish or French have fewer, some like UK English has more.

Nexidia uses its patented technology to create an index of the phonemes that were used in the voice files and then enables the user to search for the phonemes that are used in the terms the user enters in queries. According to Nexidia, indexing which phonemes occur in speech is more reliable than trying to determine which complete words were spoken. Nexidia originally developed its technology for use in managing corporate call centers where there was a need to audit or review what a company’s employees were saying.

Nexidia claims several advantages to its indexed phoneme approach. The first is rapid processing — the voice files can be converted to searchable indices at the rate of 176 hours of recordings per hour of CPU time (that’s more than 8,000 hours of recordings in a single day on a dual CPU computer) so that even large collections can be indexed in relatively short order. The second advantage is that the searches are more accurate than just listening to the recordings or converting voice to text and then searching text. The third is that the cost of the processing is much less than the cost of listening or transcribing.

The Nexidia voice search interface permits the user to use “and,” “or” and “not” Boolean-type logic, and permits the user to set required adjacency in seconds for search terms. The result display is interesting — each search term is assigned a color and the user can tell by looking at the depiction of the sound file how close the search terms occurred to each other:
 

Reviewers can assign tags such as “Responsive” or “Privileged” to files, and Nexidia can output those files in the desired format (typically WAV) along with the associated metadata for production to other parties.

Nexidia offers its technology on a services basis and but also licenses its software for clients who would prefer to process their own data and have the expertise to handle the various sound file formats found within their organization. Nexidia’s processing charges are typically based on a startup charge, which will include a certain number of hours of voice recordings.

The technology is also available through Nexdia partners such as Fios, National Data Conversion and TechLaw Solutions.

For further information:

   

This article appeared originally in the November 2008 ALSP Update, the monthly publication of the Association of Litigation Support Professionals and is reprinted with permission. Read more about this nonprofit membership organization at www.alsponline.org.

www.HowieConsulting.com

When you have to get it write
For more information, email Joe Howie, Joe@HowieConsulting.com