Review platforms are there to allow law firms, or investigators, to find documents of relevance, through the process of electronic discovery, far far quicker than manual searching and reading paper documents could ever hope for.
The issue is that concept searching tools seem to be rather “fuzzy”, there is not a definitive formula that you can point to and say “this is how it works”. In fact even most of those using these tools don’t really know why one document will be in a cluster, compared to another. But does that mean concept searching tools should not be used?
10 years ago a review of documents could be done, in the main, on paper. By printing out the documents, or taking them straight from the filing cabinet. However as time goes on there is more email and documents. As more email accounts are created more people in different positions are using email, from bank accounts and water bills, to dentists and doctors, peoples email boxes are growing in volume.
Also, the amount of time we have, per day, to email has increased. It used to be that we could only email from 9 to 5, Monday to Friday – 40 hours a week, but now with company laptops, broadband, 3G cards, and the beloved blackberry, people are hardly ever without access to their email, radically increasing the potential to send emails.
This, combined with increased data retention in the corporate environment, means that if data is to be reviewed over a few years, e.g. 2007 to 2009, for say 20 custodians, its likely to be huge. The initial data set could well involve millions, if not tens of millions of documents.
In this case we will assume that there are 20 million documents, all of which are email, or email attachments, including data from backup tapes and archives. If the standard de-duplication procedure reduces the data down to 14 million documents, and then the keyword searching to 8 million document. This means that 8 million documents, after keyword searching and de-duplication would be, which is a staggeringly large amount of documents.
Assuming that a team of junior lawyers is used, as a first review, to get rid of everything that is not obviously relevant, before the review in earnest begins, how long would it take to review all of the documents, quickly?
If each document takes just 60 seconds to look at, to decide it if potentially relevant and mark, in the review platform as “relevant or “non-relevant”, it would take 13,000 days (assuming 10 hour days) to complete this task.
This is clearly not a realistic option. If 100 lawyers were put on it, it would take just 130 days, assuming 100% productivity, but that’s just the first cull, and that’s a LOT of money.
Another option would be use less keywords, to hone in on the relevant data. This may reduce the documents to 1 million, but that’s still 1,600 days. The next option is to speed up the review, don’t spend 60 seconds reading the document, but spend 10 seconds, to skim the document. This now bring the initial cull to 277 days, which means that 10 lawyers could complete it in 27 days (assuming 10 hours a day, 7 days a week).
But that means that 10 lawyers, probably junior, are working very hard, very quickly, doing a very boring very repetitive task for a very long. The probability of error is huge, in fact it will can be guaranteed, but that is accepted.
Humans make errors, its not malicious, its just one of those things, its “proportional” to accept that this occurs. The trouble is the errors are not known, there is no log of the decision making process, and it costs huge amount of money to scan these documents, and then apply a quick decision making process to them.
But if its known that errors will occur, and that this will be an expensive, why do it? Because, until the past few years, there has been no other option.
But now concept searching tools change this. It is entirely possible to, using modern technology, to use automated filters, and even decision making tools, to cull down a data set using clustering and concept searching.
These tools can take a data set, of millions of documents, and cull it down to hundreds of thousands; using either clusters to hone in the target documents, or using documents that are known to be relevant to find similar documents.
Do concept searching tools make mistakes? Of course.
The difference is that the errors can be measured, as its a repeatable process, its a lot cheaper and a lot quicker, than a human cull. Currently this technology can not compare to a detailed review, and that is not what its being sold as – its another culling tool, like keyword searching or de-duplication.
Concept searching will not work for every project, and may not always be suitable, but in large cases it should be considered.