Electronic Discovery: What is Concept Searching?

There is much discussion about concept searching, its benefits, etc (and there are  quite a few articles on this site about the very subject). But what is it, and how is it deployed?

What is concept searching?

Concept searching is a method of searching files not based on keywords, but on the subject matter of the document, paragraph, or sentence.  This is different to keyword searching which requires an exact keyword hit.

Example 1:

“Do you want to watch football tonight, I bet Chelsea scores?”

Example 2

“Do you want to watch the game tonight, I bet Chelsea scores?”

If the term “football” was used as a keyword, then Example 1 would be found, but Example 2 would not.  But they are both clearly about football. However, a concept searching tool would be expected to “find” both examples, because “Concept Searching” is, as the name implies, actually looking for “concepts” rather than just keywords.

How is Concept Searching Deployed?

Concept Searching can be deployed in a variety of methods, and with many different names, depending on the vendor/service provider/consultant. We will attempt to cover the most common names, and methodologies, below.

Clustering/Buckets

This is, perhaps, the most commonly known use of concept searching. It takes a group of data and then breaks it into groups of “similar documents”. E.g one group could be about documents relating to football, another about meetings, etc. The number and size of the documents would depend on the documents, and the concept searching tools being used. For example 10,000 documents, could be broken in 10 groups of 1000 documents each, or 1000 groups of 10 documents each. Equally there could be 1 group of 5,000 and 5 groups of 1,000. There are a near infinite number of combinations. The more advanced tools on the market allow the operator a degree manipulation on the sizing and nature of the groups.

The advantage of these groupings is that it allows the effective focusing of resources. e.g. The groupings that appear to be about football, parties, and other junk material can either be not reviewed, or just scanned quickly. If there are 500 documents in one group, and a sample review of that group shows that they all relate to fantasy football, and all the email titles appear to relate to fantasy football, then it may (case dependant) be reasonable to skip the rest of the group. Equally, if there is a group of 1,000 documents, related to “contracts”, the senior reviewers can be dedicated to the cluster, and a detailed review can be conducted early on in the case. Rather than going through one or two reviewers before those documents get reviewed in detail.

Auto-Tagging/Predictive Marking

This methodology works on the same technology, identification of documents through concepts, but rather than creating multiple groups it will create a couple of groups or possibly only one. Generally a small sample of documents are provided, which are similar to each other, to search agains ta very large number of unknown documents.  The concept searching tool will search the large number of unknown documents for documents which are “similar” to the known document set. Documents that are found to be similar will then be identified and clustered together. This type of technology can be deployed in several different ways, e.g.

  • Looking for documents which are similar to a document already disclosed
  • Looking for documents that are similar to documents that are known to be junk, e.g. if there is a lot of social networking email traffic, this could be used to identify much of the spam and remove it from the review set
  • If  “hot documents” are found during the initial review, these can be used to identify other similar documents.

Concept  Searching “words”

Some concept searching tools allow the searching of words or paragraphs. In these circumstances the tool is doing just what is done above, but on a more focused paragraph or sentence rather than an entire document. This is particularly important when dealing with large documents; if there is a 500 page document that is  important or “hot”, because of one paragraph, concept searching for similar documents will often produce junk. In these cases concept searching for a key paragraph may be more effective.

Keyword Searching, is this still needed?  With the advent of concept searching, should keyword searching still be conducted? This subject will be covered in the next article on concept searching.

Advertisements

2 Responses to “Electronic Discovery: What is Concept Searching?”

  1. Electronic Discovery: Concept Searching & Keyword Searching « Where is Your Data? Says:

    […] the preceding article, in which concept searching is discussed,  an example of where concept searching would trump keyword searching was used and is […]

  2. IQPC Brussels Focus: Recommind, search powered IRM software - The Posse List Says:

    […] number of vendors each having its own name.  For some overviews on concept searching click here, here, here and […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: