Electronic Discovery: The Benifits of Concept Searching Tools

Non-Linear review or concept searching differs in methodology but, not the  final aim, of a linear review.

Review platforms are there to allow law firms, or investigators, to find documents of relevance, through the process of electronic discovery, far far quicker than manual searching and reading paper documents could ever hope for.

The issue is that concept searching tools seem to be rather “fuzzy”, there is not a definitive formula that you  can point to and say “this is how it works”. In fact even most of those using these tools don’t really know why one document will be in a cluster, compared to another.  But does that mean concept searching tools should not be used?

The Problem

10 years ago a review of documents could be done, in the main, on paper. By printing out the documents, or taking them straight from the filing cabinet. However as time goes on there is more email and documents. As more email accounts are created more people in different positions are using email, from bank accounts and water bills, to dentists and doctors, peoples email boxes are growing in volume.

Also, the amount of time we have, per day, to email has increased. It used to be that we could only email  from 9 to 5, Monday to Friday – 40 hours a week, but now with company laptops, broadband, 3G cards, and the beloved blackberry, people are hardly ever without access to their email, radically increasing the potential to send emails.

This, combined with increased data retention in the corporate environment, means that if data is to be reviewed over a few years, e.g. 2007 to 2009, for say 20 custodians, its likely to be huge. The initial data set could well  involve millions, if not tens of millions of documents.

Working Example:

In this case we will assume that there are 20 million documents, all of which are email, or email attachments, including data from backup tapes and archives. If the standard de-duplication procedure reduces the data down to 14 million documents, and then the keyword searching to 8 million document. This means that 8 million documents, after keyword searching and de-duplication would be, which is a staggeringly large amount of documents.

Assuming that a team of junior lawyers is used, as a first review, to get rid of everything that is not obviously relevant, before the review in earnest begins, how long would it take to review all of the documents, quickly?

If each document takes just 60 seconds to look at, to decide it if potentially relevant and mark, in the review platform as “relevant or “non-relevant”, it would take 13,000 days (assuming 10 hour  days) to complete this task.

This is clearly not a realistic option. If 100 lawyers were put on it, it would take just 130 days, assuming 100% productivity, but that’s just the first cull, and that’s a LOT of money.

Another option would be use less keywords, to hone in on the relevant data. This may reduce the documents to 1 million, but that’s still 1,600 days. The next option is to speed up the review, don’t spend 60 seconds reading the document, but spend 10 seconds, to skim the document.  This now bring the initial cull to 277 days, which means that 10 lawyers could complete it in 27 days (assuming 10 hours a day, 7 days a week).

But that means that 10 lawyers, probably junior, are working very hard, very quickly, doing a very boring very repetitive task for a very long. The probability of error is huge, in fact it will can be guaranteed, but that is accepted.

Humans make errors, its not malicious, its just one of those things, its “proportional” to accept that this occurs.  The trouble is the errors are not known, there is no log of the decision making process, and it costs huge amount of money to scan these documents, and then apply a quick decision making process to them.

But if its known that errors will occur, and that this will be an expensive, why do it? Because, until the past few years, there has been no other option.

But now concept searching tools change this. It is entirely possible to, using modern technology, to use automated filters, and even decision making tools, to cull down a data set using clustering and concept searching.

These tools can take a data set, of millions of documents, and cull it down to hundreds of thousands; using either clusters to hone in the target documents, or using documents that are known to be relevant to find similar documents.

Do concept searching tools make mistakes? Of course.

The difference is that the errors can be measured, as its a repeatable process, its a lot cheaper and a lot quicker, than a human cull. Currently this technology can not compare to a detailed review, and that is not what its being sold as  – its another culling tool, like keyword searching or de-duplication.

Concept searching will not work for every project, and may not always be suitable, but in large cases it should be considered.


14 Responses to “Electronic Discovery: The Benifits of Concept Searching Tools”

  1. Electronic Discovery: What is a Non Linear Review? « Data – Where is it? Says:

    […] prior to reviewing,  in a way to aid the speed and efficiency of review. This is often done using concept searching, clustering, or near […]

  2. Electronic Discovery: Concept Searching Tools « Data – Where is it? Says:

    […] Concept Searching Tools Posted on June 8, 2009 by 585 Concept Searching tools offer a variety of benifits in a non-linear […]

  3. Electronic Discovery: What is Linear Review? « Data – Where is it? Says:

    […] these large scale reviews, concept searching methods or “non-linear” review are often employed, as they are the only feasible method of […]

  4. Concept Searching: Better or Worse than a Human? « Data – Where is it? Says:

    […] For this reason a different approach needs to be taken, and this is where concept searching steps in, with the benefits of clustering and automatic categorization. […]

  5. Electronic Discovery: iConect adds Concept Searching « Data – Where is it? Says:

    […] like Relativity, is using Content Analyst within the tool to provide all the benefits of concept searching; while its not a graphical interface, like Attenex or Discovery Mining, it does allow users to […]

  6. Data – Where is it? Says:

    […] the marketplace is seeing products which work by grouping similar documents together. The “similar” assessment is done by linguistic matching, i.e. documents with the same sets of […]

  7. White Paper: Litigation Support Marketplace « Data – Where is it? Says:

    […] the marketplace is seeing products which work by grouping similar documents together. The “similar” assessment is done by linguistic matching, i.e. documents with the same sets of […]

  8. Forensics: Computer Forensics – Police or Civil? « Data – Where is it? Says:

    […] Concept searching is on the increase, and indexing is a de rigueur. […]

  9. Concept Searching « Data – Where is it? Says:

    […] What are the benifits of concept searching? […]

  10. Electronic Discovery: Is Concept Searching “a reasonble search”? « Data – Where is it? Says:

    […] to a staticians enquiry, but they can be used to try gain an idea of the costs involved and the benifits of concept searching versus linear […]

  11. Electronic Discovery: Advice on Keywords « Data – Where is it? Says:

    […] keyword searching. But, after the discussion of the DigiCel case and the growth, and marketing, of concept searching technology the perceived knowledge of keywords is now being […]

  12. Electronic Discovery: Technology and the Market « Data – Where is it? Says:

    […] be deployed earlier on and more consistently. This has started already and will continue to do so. Concept Searching and near de-duping are increasingly common, with most major vendors now offering concept searching, […]

  13. Wer sucht besser: Computer vs. Mensch - probativus Blog Says:

    […] For this reason a different approach needs to be taken, and this is where concept searching steps in, with the benefits of clustering and automatic categorization. […]

  14. Electronic Discovery: Francis Bacon and Concept Searching « Where is Your Data? Says:

    […] most obvious solution is automation through concept searching. As previously discussed concept searching can be of great value during a large document review […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: