Electronic Discovery: Is Concept Searching “a reasonble search”?

July 28, 2009 — 585

Its a small sentance, but a big question “Is concept searching a reasonable search?”.

In the UK the process of electronic discovery is defined by the Civil Procedure Rules, Part 31. 31.7 defines the duty of search and states:

1) When giving standard disclosure, a party is required to make a reasonable search for documents falling within rule

31.7 then goes onto state

(2) The factors relevant in deciding the reasonableness of a search include the following –

(a)     the number of documents involved;
(b)     the nature and complexity of the proceedings;
(c)     the ease and expense of retrieval of any particular document; and
(d)     the significance of any document which is likely to be located during the search.

Neither “Reasonable” nor “reasonablness” are defined, either in the Civil Procedure Rules or in UK law in general. It is quite literally based what is reasonable. A guide for this “reasonableness” test is often built upon case law. For example in UK criminal law there is a wealth of case law on what is “reasonable force” to defend yourself.

But there is only one case where the issue of reasonable search in electronic discovery has been addressed in the UK, and that is the DigiCel case, and this did not relate to concept searching. So what is reasonable?

Firstly, let us address two extreme cases and using this to calibrate our own judgement.

Example 1: Low number of documents.

A single PST file, with 2000 emails, from a single custodian is provided for review. The case is valued at £10 million.

Is it reasonable to cull this data down using concept searching? This author would suggest not, as that volume of documents could be approached rapidly through a linear review. Concept searching could be used to enhance the review, but not to cull the data [though with such a low number of documents there would be unlikely to be any real benifit]

Example 2: An extremely high number of documents

A file server has 100 million documents on it. These 100 million documents are in general, shared documents, within a massive company. 10 custodians are being investigated and they could have saved their data anywhere in this data set, therefore it has been decided to review this data set for relevant documents. The case is valued at £10 million.

Is it reasonable to search and cull this data set with concept searching? This author would say it was; and this is the reasoning:

100 million documents is a LOT to read, and would cost millions to review, let alone process. In fact reading all of the documents, including the processing, could quickly come to a some which is approaching the £10 million value. I.e it is probably not economically feasible to review the data. This alone would make the review of all of these documents unreasonable.

Is there any other metric we can use to try and measure the reasonable of this review? We could use a bit of maths.

Firstly we need to estimate how many document we could find from the custodians; we are going to use high numbers as this will err on the side of caution (this will become clear why shortly).

We will assume a custodian person creates 10 documents a day for every work day (200 days a year). We will furhter assume that the time of relevance for the particular incident is 5 years. This means that there a maximum of 200*10*5 documents per custodian, or 10,000 documents per custodian. Which means that there will 100,000 documents in creatd by the custodian in total. We will futher assume that 100% of these document of relevance (a highly over optimistic assumption).

Therefore, out of the 100 million documents 100,000 docments will be relevant. Therefore only 1 in a thousand is likely to be relevant. If the expected set of documents was a maximum of 10,000 documents then the probability of any one document being relevant is 1 in 10,000. That is a very low number. In reality its highly unlikely that 10 relevant documents are going to be produced a day, the the total number of “relevant documents” could be as low as a few thousand this could bring the probability to 1:50,000; a very low number.

This author would state that a 1:1,000 or 1:10,000 chance of finding a document is such a low probability that it would be unreasonable to review of all those documents.

Therefore culling methods have to be used, which would lead to concept searching. To put it another way, if its unreasonable to review the data via a linear method the only reasonable option is to review by non-linear methods.

These calcuations, combined with the pressing issue of costs would lead us to conclude that it is not only reasonble to use concept seaching tools, but required, in this extreme case.

Middle Ground

But what about the middle ground. What about 10 million doument, with 20 custodians? Calcuations should be done on the cost of linear versus concept searching.

What are the expected costs involved and what are the expected benifits? Sampling of data may give an indidcation of the volume of relevant versus non-relevnat documents. If there is a very small sample of relevant documents in a mass of irrelevant documents then concept searching has to be seriously considered to cull the documents down.

These methods of calcuations are not going to stand up, by a long way, to a staticians enquiry, but they can be used to try gain an idea of the costs involved and the benifits of concept searching versus linear review

Posted in 1 - e-discovery, Concept Searching. Tags: Civil Procedures Rules, Concept Searching, eDiscovery, Electronic Discovery, linear review. 3 Comments »

iamdavo Says:
July 29, 2009 at 8:09 pm

Nice analysis. I do think that one must also consider error rates when making these calculations. Having humans manually cull through millions of documents you can plan on a 3% error rate. Concept Search compared to keyword has a much higher recall rate and could, at the very least, dramatically reduce document populations for reviewers. Concept searching costs have come way down and could be used in all discovery now. Document volumes are certainly one problem, but I would add accuracy and recall as other considerations. Thanks for the article. Really enjoyed reading it.

585 Says:
July 29, 2009 at 8:16 pm
Thanks and absolutely.

I would put error rates for humans far far higher than 3%, with huge volumes of documents not even reviewed or reviewed poorly, as legal teams just skip them due to the necessity of speed.

The issue of errors will be covered shortley in a future article.

Electronic Discovery: Concept Searching and Error Rates « Data – Where is it? Says:
July 29, 2009 at 8:46 pm

[…] What is the risk of the documents not being reviewed and being relevant? Are these tools proportional? […]

Where is Your Data?

Articles Categories