eDiscovery | Where is Your Data?

IBM Acquires PSS Systems

January 3, 2011 — 585

The lines between compliance and e-discovery are drawing closer. New York’s IBM has acquired California-based PSS Systems. PSS Systems develops Atlas, an information governance suite that helps users analyze and assess risks in maintaining data and formulate policies and automate workflows to dispose of excess data. According to Ron Ercanbrack, vice president of enterprise content management at IBM, Atlas is a “logical complement” to other IBM products like Content Collector and eDiscovery Analyzer and Manager. “With the acquisition of PSS Systems, we are able to expand our portfolio with a broader set of legal solutions that for the first time link corporate legal hold policies to the reality of how information is managed and disposed of,” explains Ercanbrack.

Posted in UK Law. Tags: eDiscovery, IBM. Leave a Comment »

Electronic Discovery: Using Keywords to Cull?

October 3, 2009 — 585

The previous article, on concept searching and keywords, ended with the following paragraph:

The use of keywords will not only depend on the case, but the tools and the service being used. Some companies will apply keywords, and then only those documents responsive to the keyword search will ever be available for review. Other companies/services will apply the keyword search, after the data was loaded into a review platform. These two different service offerings can make a huge difference to the legal strategy, the results, and the costs.

Why such a big difference?

The problems are immediately obvious, once the marketing spiel is stripped away.

Option 1; This is one option to review data

Data is collected “Collected Set“
Collected data is culled to create a “Review Set“. Culling could involve keyword searching, dates filtering, etc
The Review Set is reviewed, this will then produce the “Disclose Set“. Methods to search the data, during the review, include concept searching, keywords, dates, etc.

This method has a couple of problems, and one benefit. The problems are that key documents could be missed. No matter how well thought out the keywords are used to create the review set, data will be culled, therefore its possible (and probable) that documents that are of interest are culled out in the huge mass of documents. Keywords change, date ranges shift, as the undersanding of the case develops. Returning to the example of football (used in the previous two articles), if a keyword “football” is used, but it later turns out that the key custodians uses the term “footie” rather than football, what will be done then? Re-searching and processing of all of the documents that are not in review?

The benefits of this method (initiall culling with keyword) is simply cost, and nothing else.

Most professional vendors will allow data to be moved between the Collection Set into the review set as the case change, but there will often be an additional cost to this, possibly a significant one. There will be time delays, processing and loading fees, etc, these costs and time delays, can discourage the movement of data into the review set.

Option 2 – A subtle difference.

The idea is simple: Load all of the data into the review platform, everything except for the files that would never be reviewed in that case (e.g. Movie files, MP3s, etc). All of the culling can then be done within the review platform. This was a decision is not made on what is relevant early on, or what keywords are to be used. All that is removed is the absolute junk.

Data for review can then be moved into an appropriate folder/bucket/location for review, using whatever means required. This “review set” can then be searched, filtered, concept searched, near de-duped etc, by the reviewers as normal. I.e the reviewers will see only the data they need to see, but everything is in the review platform. If/when new terms are identified by the reviewers (dates, words, file types, etc) this can be applied to the entire data set, and the new documents almost instantly made available for review. This is quick to do, as the data is already in the review platform. If its quick its easy, and therefore should be cheap.

Vendors can’t process that much data!

The obvious statement against Option 2, is that “Vendor can’t process all of the data for a case, hence the initial cull“. But this is not entirely true – depending on the nature of the cull. For example, if the data is to be culled by a keyword then the data must be processed in the first place. The file must be opened, the text extracted, and then the document searched. Therefore much/all of the processing work has already been done.

There are reasons that this has not been done, historically. One of these is TIFFing. Data was often needed to be TIFFed prior to loading into a review platform, this is very time consuming and resource intensive. Thankfully, this is no longer required with many review platforms taking away many of the problems that have been seen previously. Another reason is data space: High quality data storage space was not cheap. So storing 100 GB of data in a data centre was expensive, simply because of the hardware involved, but hardware prices have come down (in line with Moore’s law). This may still be a problem for the giant, multi-terrabyte cases, but for the 10s or 100s of GB this is less likely to be an issue.

There will be occasions when Option 1 will be a better method, or neither Option 1 or 2 are suitbale and an entirely differnet approach is required. But,generally, it’s worth considering all options and not letting the legal stratergy for review being lead by a rigid processing/conveyor belt pricing structure.

Really the question is not if you cull data with keywords, but at what stage should that happen?

Posted in 1 - e-discovery. Tags: Concept Searching, eDiscovery, Electronic Discovery, keywords. Leave a Comment »

Electronic Discovery: Early Case Assessment v Review Platforms

September 17, 2009 — 585

Early Case Assessment: Why not just use a linear review platform?

Early case assessment is the new buzzword in town. Last year it was concept searching, previously it has been near de-duping, long before that it was simply de-duplication.

A variety of tools are on the market, Recommind, Clearwell, Nuix, offer Early Case Assessment features. These are good tools, with lots of R&D and a growing market share, but does this mean that linear review platforms, the RingTails, the iConnencts, the Relativities, are of a thing of the past?

No, and for two reasons.

Firstly the Early Case Assessment tools are often not geared up for the highly detailed linear review, which will be required once the bulk of the documents have been culled. RingTail, for example, has an incredibly granular capability, from using multiple highlighting colors to rotating individual pages in a document that have not been scanned correctly. Most people who are producing early case assessment tools recognize this and currently recommend that their tool is used for a first review (hence the name), and the detail work is done in more of the heavy weight linear review tools. This alone means that the linear review platforms are here for a while yet; of course those building the ECA platforms are no doubt working on producing a more detailed review platform, introducing tiffing, redaction, etc. The ECA camp are, almost certainly, going to move into the linear review market space sooner rather than later.

Secondly, the reverse is also true. Some “linear” review platforms have moved into the market space of the ECA tools. The linear review platforms that have not moved on, the Summations and Concordances are certainly a thing of the past, but some review platforms that have evolved. Relativity and IConect probably have a strong future ahead of them, as they have moved with the technology. Internal tools such as Documatrix, and KrollOntrack InView have also evolved and developed along the same lines, but are not available for purchase so are not discussed here.

Relativity has taken an approach, much like iPhone has with Apps, to allow third part vendors make software for their product. ContentAnalyst and Equivio are the two big ones. This means that Relativity leverage the knowledge and experience of other companies, rather than having to build everything themselves. RingTail is building a similar package to link it to Attenex.

The net result is that a Relativity can allow for a linear review, a non linear review, or an early case assessment in a single platform. This means that once data has been loaded into a review platform it can be culled, clustered, de-duped, near de-duped, and generally treated as an early case assessment tool.

Data can be loaded into Relativity and culled down using the methods of an ECA platform. Then once the set of data has been chosen for review this can simply be released/tagged or otherwise identified for a full scale/detailed linear review. This can be done either by the vendor or the reviewers.

The beauty of this type of solution is that if there is an error in the culling process, either too much or too little data has been identified for linear review, then it can easily be untagged or re-tagged and moved between the ECA and Linear review phase. The data is not moving between platforms, its not moving from a Nuix to a Ringtail, or a Recommind to an Introspect, but staying inside Relativity, this is very cheap and almost instant to do. The data is moving between phases, not platforms.

So, why the big issue about Early Case Assessment?

If linear review platforms can also conduct non-linear review, why the big buzz about early case assessment?

There are probably several reasons for this.

The Megan Fox effect. Megan Fox is a good looking woman, no doubt, and that may influence our opinion of her as an actress and the quality of her films. Is it any wonder advertisers use good looking men and women to sell products? We associate beauty with quality. Early Case Assessment tools generally look brilliant, with clean simple interfaces, easy to use, intuitive, etc. The new breed of linear review platforms with concept searching may have all the functionality of an ECA tool, but it’s not immediately obvious, those skills are hidden.
Pricing: Traditionally the linear review tools have had a high price and been seen to be expensive to review in time and money for lawyers. Early Case Assessment tools have a different pricing model, and their aim is not to make a lawyer review everything, but just an initial look. This is not necessarily cheaper than processing all of the data, putting it into a review platform and then seamlessly moving between linear and non-linear review as needed, but on first look it does appear cheaper.
Processing Myth: Processing is hard work, period! Even processing 1,000 files will generate numerous errors. Then there is the loading of the data into a review platform, e.g. moving from Discovery Cracker to RingTail, or LAW to iConnect. This also causes a degree of effort, time, pain, and therefore costs. Early Case Assessment tools offer a simple solution; you just load the data into the ECA tool/platform and then review it, and then “process” it later, but only process the data you need to review in detail. Voila, there is a cost and time saving already, quick get a an ECA….err….not quite. If data can be searched, filtering, clustered, etc – then it has been processed. The ECA tool has processed the data, in almost the same way the traditional processing tools have – extracted the metadata and text, for searching and filtering. The ECA tools tend not to show the errors as much, they make it simpler to use, it’s more of a black box. The value of that versus the risk, will depend on the client, vendor, and case. The lack of processing appears to be a clever branding/pricing trick. The main exception to this is Nuix which has managed to process files in a slightly different manner, making it genuinely faster than other processing tools on the market, but that is the exception rather than the rule.

In short, what can be achieved through an Early Case Assessment tool can be achieved through a good quality review platform, assuming it has all the bells and whistles and those using it know how to use, and how it needs to be used for specific case.

This does not mean that dedicated early case assessment tools do not have a place in the market; they will no doubt grow particularly well in internal market, for corporate, where they are trying to get a hand and what they have and what they need.

It’s quite probable that review platforms and ECA platforms will merge into each others’ market space, with ECA tools adding complete linear review capability and review platforms adding complete processing capability.

The vendors will, of course, need a new name for these tools, as well as new look, and new pricing structure.

Posted in UK Law. Tags: Early Case Assessment, eDiscovery, Electronic Discovery, Review Platforms. 2 Comments »

Electronic Discovery: Francis Bacon and Concept Searching

September 17, 2009 — 585

Human errors and the human state of mind have a big effect on the decision making process in electronic discovery, but what are these errors?

A few hundred years ago Francis Bacon stated: The human understanding when it has once adopted an opinion (either as being the received opinion or as being agreeable to itself) draws all things else to support and agree with it. And though there be a greater number and weight of instances to be found on the other side, yet these it either neglects and despises, or else by some distinction sets aside and rejects; in order that by this great and pernicious predetermination the authority of its former conclusions may remain inviolate

What Francis Bacon was saying, rather elegantly, is that people stick to their guns. What he believed to be true in the 17^th Century, psychologist in the 20^th and 21^st century have now shown to be true.

Humans Examining Evidence

It has been demonstrated by psychologist that people often not only stick to their beliefs but they seek out evidence to re-enforce their own opinions and reject new evidence which is contrary to their belief. Test after test has shown this, and it can be seen in real life situations, from politicans to generals. In the run up Pearl Harbor there where there were numerous warnings that an attack was about to occur, including a Japanese submarine that was sunk just outside the harbor only 1 hour before the attack. But the admiral in charge, Admiral Kimmel, had believed that Japan would not attack Perl Harbor, and so ignored the information and deliberately misinterpreted information, intelligence, and warnings – he stuck to his guns[1]. He did not even cancel weekend leave for his staff, or ask if the Army were manning the anti-aircraft guns (this would have only required a single phone call). This is not uncommon and people, of all levels, do this quite regularly. This state of mind effects scientist and politician alike.

Humans making decisions

Not only do humans often stick to incorrect decisions, but we are easily influenced. For example people will often follow something known as the “rule of primacy”. This says, in short, that the first thing people learn about a subject, they take to be true.

Example: If a person is told that Car A is fantastic by a friend of theirs, then they will tend to believe that, even if they are later told that Car A is less than good. In fact they will seek out evidence to support what they already believe.

Another well known cause of errors in humans is the availability error. This means that the stronger the memory, the more powerful the memory, than the more likely people are to make decisions based on that information. This has been shown in labs and the real world. For example, earthquake insurance in areas that have earthquakes increases immediately after a quake but decreases the longer it has been since an earthquake has occurred – because the memory of the quake fades. However, the probability of quake increases the longer the time between quakes and safest after the quake. I.e. people are buying and not buying insurance at exactly the wrong time. Equally if people are asked to estimate which is the more common, words with beginning[2] with the letter “R” or having “r” as the third letter they will often say the former, as they can immediately think of words beginning with the R. Rain, rainbow, rivet, red, real, reality, etc, But, in fact there are more words with the third letters as “r”, street, care, caring, borrow, etc. However the people have the first letter “R” strongest in their mind, so that is what they believe.

Other well known causes of human errors include:

Peer pressure/conformity. People tend to follow the decisions of others, even when they are quite obviously wrong. There are well known examples tests of this, such as a person being put in a room with 5 or 10 other people and asked to complete simple tests such as say which is the shorter of three lines, or how many beats of a drum there was. The tests are simple; the beats would be easy to count or one of the lines would be obviously shorter, but the test subject would be in a room with 5 or 10 other people who were actors, paid to deliberately give the same wrong answer. If the answers of everybody were read out aloud the test subject would, more often than not, follow the incorrect answers.

Obedience/Management pressure: Following the opinion of their superior (depending on the culture), regardless of if it right is something that can often occur. This was most famously demonstrated in the Stanley Milgram tests where volunteers willingly applied enough voltage to kill other innocent people, simply because they were asked to.

There are many more examples proving these human conditions, and even more conditions that cause us to make errors on a day to day basis – it’s just the nature of the human brain. It is how we work (or don’t).

Electronic Discovery & Psychology

So what has all of this psychology and “soft science” got to do with electronic discovery?

Electronic discovery is historically driven by humans, from keyword selection to the relevance of a document. It is the errors identified above, and more, that can come into play during a review.

Below are examples of how these well known errors can affect a review:

Once a person decides on a keyword search criteria, once they have put their flag in the ground, they are, statistically, unlikely to be willing to change their mind about the value of the search criteria. They may even ignore evidence or documents that could prove otherwise. In fact research has shown that once a person states publically, or commits a decision to writing, they are even more likely to stick to their guns than somebody who makes that decision privately.
If a person has to review another 500 page document, and it’s late and they want to go home, then may quickly start to believe that this document is not relevant. They may start to scan the document looking for information that demonstrates that the document is not relevant, rather than looking for evidence that shows it is relevant.
A second opinion on document’s relevance may be sought, from a senior manager or colleague, and that opinion will then be followed through the review, regardless of it was right or not. Even if the original reviewer believes the opinion to be wrong.
If a review platform does not record who made the decision of if a document is relevant or not, then the reviewer may be less inclined to be so diligent, as they are removed from their responsibility by anonymity. [Anonymity has also been shown to be an influencing factor in people’s behavior and choices].

Solutions?

There are methods to this try and reduce the amount of traps the human brain walks into. Simply being aware of the problems and making an effort to look avoid them is one solutions, e.g. consciously give as much weight to first piece of evidence seen as the last piece of evidence.

However, we are all fallible and in large scale reviews avoiding these errors is going to be very difficult to resolve, and possibly expensive in terms of time.

The most obvious solution is automation through concept searching. As previously discussed concept searching can be of great value during a large document review and, like all systems, it will have errors; but we know it’s not susceptible to the human failings discussed in this article.

It doesn’t matter what the system saw first, how strong or visual a document is, or what other reviewers think. Concept searching will only apply, repeatable, known logic to a group of documents.

[1] As a result of this the Admiral Kimmel was later demoted

[2] This example is lifted directly from the book Irrationality, by Stuart Sutherland

Posted in Concept Searching. Tags: Concept Searching, eDiscovery, electronic disclosure, Electronic Discovery, Francis Bacon. 1 Comment »

Electronic Discovery: E-Disclosure Qualification

September 14, 2009 — 585

Guidance Software, the forensics giant which produces EnCase has announced the release of its e-disclosure qualification “EnCase Certified eDiscovery Practitioner” – EnCEP.

The value of the EnCEP certification will have to be seen, but there are already obvious pros and cons for it.

The Pros

The employers of staff using the EnCase E-Discovery tools and bring their staff to a common standard, and employees and staff can work to a common standard and demonstrate to future employers/clients, their competance levels.

The Cons

E-Discovery is a huge area, collosal. Concept searching, near de-duping, review platforms, data recovery, backup tapes, project management, consultancy, etc. The certification currently being offered is for a very narrow part of electronic discovery, on a single tool, being taught to follow a methodology that is based on the use of the Guidance Software products.

This in itself is not a problem as long as people are aware of what the qualification actually means, but the concern is that the huge PR machine of Guidance can push forward the certification as a requirement, as a standard in the industry, as EnCE is becoming.

Increasingly it is not unusual for clients to ask staff to be EnCE certified. While there are many good people who are EnCE certified, there are those who are not, whose knowledge of forensics is very limited. On the flip side of that there are people who are not EnCE certified and who are fantastically smart, a look at the SANS website and blog will demonstate this.This site has numerous postings by people who have an incredible technical knowledge, far far above that for the EnCE exam, but their own qualificiaiton may not be accepted by certain employers/clients. Equally there are people with no certifications who are not much use.

So where does this leave us? Currently certification does not prove or disprove a skill set that a client would need, not least because clients needs are generally so varied and vast, even on a single project. The idea of certification, is a good one, but there is a long way to go before the industry has a reliable standard.

The press release by Guidance Software, is below:

The EnCase Certified eDiscovery Practitioner program was created by industry experts to meet the needs of our EnCase eDiscovery users who are handling electronic evidence in both routine and some of the largest and most complex litigations of our day,” said Al Hobbs, Vice President, Professional Development & Training Operations for Guidance Software. “Candidates who complete the EnCEP program, and earn the designation, will have demonstrated their expertise in the leading edge EnCase technology and methodology for the collection and processing of electronically stored information.” “Successful litigation depends on good legal scholarship as well as the appropriate technology infrastructure to support e-discovery. We recommend that legal professionals are screened on their understanding of technology and enterprise computing, as well as their comprehension of how technology is deployed,” said John Bace, a research vice president at Gartner, graduate of the John Marshall Law School in Chicago, and Advisory Board member for the Center for Information Technology & Privacy Law at the School. “Certification programs such as these are a step in the right direction toward ensuring that IT professionals are proficient in eDiscovery.” Over the past eight years, Guidance Software has certified more than 2,100 computer investigative professionals with the globally recognized EnCase(R) Certified Examiner (EnCE(R)) designation. The new EnCEP program will similarly enable eDiscovery practitioners to demonstrate their skills, training and experience in the proper handling of ESI for legal purposes. Information on the requirements for EnCEP candidates, the testing program and certification renewal can be found at http://www.guidancesoftware.com/computer-forensics-training-certifications.htm.

Posted in UK Law. Tags: edisclosure, eDiscovery, EnCase, EnCE, EnCEP, Guidance Software. Leave a Comment »

« Older posts

Where is Your Data?

Articles Categories