Update: Following a recent debate by about the nature of this article the following points should be highlighted.
At no point is any blame attributed to any individual. The aim of this article was to raise an issue, a small technical issue, in a small chapter, of a very large document.
The primary issue is that exchange of PST files verses a the exchange of a load file. This is a spectacularly insignificant issue when dealing with litigation in the millions or billions, and in terms of Jackson’s report as a whole. Unfortunately the author of the article happens to be one of those who has to deal with that exchange of documents, and so it becomes very important for those who have to conduct this type of work. The article covers why, from a technical perspective, one suggestion mentioned by Jackson is probably not effective, from a technical perspective.
It is not be suggested that Justice Jackson would ever be expected to know about this subject. Its not even something a lawyer dealing with a case would every really know about, its far too low level.
The other issue is the expectation of costs. While lawyers can predict their costs with the information available as can vendors, predicting those costs, early on, without accurate information, is very difficult and can be misleading. Its not unusual for vendors/consultants to be told that there will be X amount of data for Y custodians and X or Y to increase 2, 3 or even 10 fold. This can result in a 2, 3 or 10 fold increase in costs. Until the vendors and consultants get their hands on the data, have conducted their data mapping exercise, and started their processing, they often don’t have a good grip on the costs.
Vendors are aware of this and are reticent to over “promise and under deliver”.
Original Article
The report by Lord Justice Jackson into litigation costs in the UK, published last month, is seen as a major factor in influencing electronic discovery in the UK, with Chapter 40 dedicated to Electronic Discovery.
It may well be as critical as commentators suggest, but that does not mean it’s going to be influencing the industry in the correct way.
In Paragraph 5, Chapter 40 (Volume2) of the report, Lord Jackson states that:
5.10 Format of disclosure. Parties sometimes make disclosure in a format which is
unhelpful to the other side and which requires duplication of work already carried
out by the disclosing party. This causes duplication of costs. It has been suggested
that this could be avoided by the parties being required to make disclosure in a
suitable format. One example is for disclosure to be of PST files in native format
rather than disclosure of the image of the document (TIFFS or individual message
files). Much valuable information in the PST file is lost in the process of conversion
to TIFFs or message files.
5.11 It has been emphasised by practitioners that it is vital to preserve the context
of a document, for example by preserving the folder structure of the data, or the file
hierarchy (and making disclosure in native format), rather than disclosing documents
separately and not within their original folders. The paper equivalent would be to
disclose files of papers which are labelled by subject or by person (showing their place
in the relevant transactions), rather than to disclose an enormous box of papers in
unlabelled files with, for example, meeting minutes scattered throughout. One
experienced practitioner has suggested that parties should be required to produce
disclosure in a manner that is cost effective for both parties, in the event that a
specific format cannot be agreed upon.
This statement strongly implies that Lord Jackson has been given some very bad advice as even a cursory knowledge of the electronic discovery procedures would see the glaring problems with this statement:
One example is for disclosure to be of PST files in native format rather than disclosure of the image of the document
Some of the reasons this statement is so far removed from the reality of electronic discovery processing are as follows:
1) The first thing that happens to a PST file, for processing, is that its broken into MSG files so that the files within it can be processed.
- Therefore for PST files to be exchanged the PST file would be broken into messages, then reassembled so it can be broken down again. This is the very definition of duplication of effort.
2) The whole of a PST file unlikely to be disclosed, certain messages or attachments will not be passed to the other side, making the exchange of a whole of a PST impossible.
3) Native files are often not exchanged for the purposes of confidentiality; TIFF files can be redacted and native files cannot.
4) Review platforms do not review PST files, so why exchange in format that is not going to be reviewed?
5) The issue of folder paths within PST files is an important one, but the path of a PST file can be preserved during review and therefore that information can be exchanged, it just needs to be requested. If it can’t be displayed in review, it can’t be exchanged, in which case the provision of the whole PST file is not going to be of assistance anymore than exchanging the standard load file.
Finally, and most importantly, it’s simply a really bad idea.
Two companies who exchange documents correctly, following an agreed standard, e.g a concordance load file, with all of the fields correctly aligned can expect to have the data exchanged and up and running, ready for review, on the same day of the exchange. However, a company given tens of thousands of emails and documents, in PST files will spend a long time loading the PST files into their processing engine, breaking them into MSG files and attachments, dealing with the errors, corruption, and encryption issues as well as any long file paths or other problems that are seen.
Eventually, after all of the processing and error correction the electronic discovery company will end up with native files, text files, and a load file, data which can then be placed into a review platform. This data, which the eDiscovery company have produced, after all the work and cost, is the exact same product that is currently available for exchange in the standard method. This processing of a PST file would be done after the other side had conducted the same process, and then reversed it to create a PST file.
The process being hinted at by Lord Justice Jackson would look something like this:
1) Party A – Break PST files into MSG Files
2) Party A – Deal with errors/corruption/encryption
3) Party A – Extract Text
4) Party A – Create Load File
5) Party A – Create PST files
6) Party A – Exchange PST with Party B
7) Party B – Break PST files into MSG Files
8) Party B – Deal with errors/corruption/encryption
9) Party B – Extract Text
10) Party B – Create Load File
11) Party B – Load data into review
Rather than the current system which is
1) Party A – Break PST files into MSG Files
2) Party A – Deal with errors/corruption/encryption
3) Party A – Extract Text
4) Party A – Create Load File
5) Party A – Exchange Load File with Party B
6) Party B – Load data into review
Form this it can be seen that the method hinted at by Lord Jackson quite literally doubles the amount of work. For a document discussing the increasing costs of litigation and how this can be addressed this is surely not what is aimed for.
Justice Jackson cannot be expected to know this information, and therefore his advisors have to be blamed.
In Section 5.13 Justice Jackson also states that:
5.13 Practitioners and judges familiar with e-disclosure also recommend that
parties obtain estimates of the potential e-disclosure costs at an early stage. Such
estimates can be discussed between the parties and produced to the court if
necessary. Indeed, the production of such estimates would be an essential step if the
court is going to undertake “costs management” of business litigation, as discussed in
chapter 48 below.
This is one of the most common requests made by solicitors of electronic discovery vendors. How much will it cost? How long will it take? Vendors are always seen to be dragging their heels on this issue.
This is not because they are unwilling to quote, but they have a lack of information.
The costs of electronic discovery processing can be described as a function of data size (GB), number of files, number of errors.
The more data, the more files, the more errors the more it costs. Unfortunately all of this information is available until the project has been all but completed.
Data sizes: The data size estiamtes provided by clients are often wildly in accurate. Statements such as “there will be no more than 500mb of data per custodian, and there will be 10 custodians” often result in 20 custodians with 2 GB of data each, being delivered. It is these huge changes in data that cause problems for vendors to quote.
File Count: The more files there more work there is to do. 2 GB of data could be a single file, or it could be 2 million files, or anywhere in between. Its not until the processing has started that this can be done.
Errors: The more errors and the more complex they are the longer the project will take and the more it will cost.
In addition to these costs, there is the cost of review. But this cannot be known until the data has been loaded into review. 10 GB going into processing of data could be 100,000 files which is culled down to just 1,000 files, which are quick to review. Or is could be 50,000 files that is culled down to 40,000 files.
There are a lot of variables making it impossible to give an honest assessment of cost early on. Vendors could and should do a better job at estimating costs and data sizes, with sampling, concept searching, and early case assessment.
Conclusion: From reading Chapter 40 there appears to have been either bad advice given to Lord Justice Jackson, or those with the technical knowledge have failed to get their message across.