Electronic Discovery: Review Platform – eView

Integreon’s eView latest version (Version 3), is discussed below.

The review platform has a clean look to it, unlike other more cluttered platforms, e.g. RingTail, which has very granular detail but can also appear cluttered. In fact eView has an almost “Outlook 2007”  look to it, which is almost certainly deliberate as it will mean that lawyers looking for emails will feel comfortable in that environment.

It has all the standard feature that are expected in a review platform, searching, de-duplication, etc. But it also has message threading as well, which is increasingly common but still not standard in all review platforms.

Like most review platforms it has an SQL back end, but the system is accessed via Citrix, which is less common in the review platform community. Offerings from RingTail, Relativity, Recomind, iConnect, Documatrix, etc, are  all available a web browser (normally its only I.E). The major exception to this being Attenex, which also works via Citrix.

eView looks like a standard linear review platform, but has Content Analyst, a concept searching tool built in. This means that both a linear and non-linear review can be conducted in the same review platform. This puts eView competing directly with Relativity, which has been very well received on both sides of the Atlantic.

The screen shots below shows eView being used to concept search. In this first example a paragraph is being highlighted, so that documents that are similar to that paragraph can be searched for.

Integreon eView Screen Shot Concept Searching

Integreon eView Screen Shot Concept Searching

Integreon eView Screen Shot Concept Searching (2)

Integreon eView Screen Shot Concept Searching (2)

Advertisements

Electronic Discovery: Early Case Assessment v Review Platforms

Early Case Assessment: Why not just use a linear review platform?

Early case assessment is the new buzzword in town. Last year it was concept searching, previously it has been near de-duping, long before that it was simply de-duplication.

A variety of tools are on the market, Recommind, Clearwell, Nuix, offer Early Case Assessment features. These are good tools, with lots of R&D and a growing market share, but does this mean that linear review platforms, the RingTails, the iConnencts, the Relativities, are of a thing of the past?

No, and for two reasons.

Firstly the Early Case Assessment tools are often not geared up for the highly detailed linear review, which will be required once the bulk of the documents have been culled.  RingTail, for example, has an incredibly granular capability, from using multiple highlighting colors to rotating individual pages in a document that have not been scanned correctly.  Most people who are producing early case assessment tools recognize this and currently recommend that their tool is used for a first review (hence the name), and the detail work is done in more of the heavy weight linear review tools. This alone means that the linear review platforms are here for a while yet; of course those building the ECA platforms are no doubt working on producing a more detailed review platform, introducing tiffing, redaction, etc. The ECA camp are, almost certainly, going to move into the linear review market space sooner rather than later.

Secondly, the reverse is also true. Some “linear” review platforms have moved into the market space of the ECA tools. The linear review platforms that have not moved on, the Summations and Concordances are certainly a thing of the past, but some review platforms that have evolved. Relativity and IConect probably have a strong future ahead of them, as they have moved with the technology. Internal tools such as Documatrix, and KrollOntrack InView have also evolved and developed along the same lines, but are not available for purchase so are not discussed here.

Relativity has taken an approach, much like iPhone has with Apps, to allow third part vendors make software for their product. ContentAnalyst and Equivio are the two big ones. This means that Relativity leverage the knowledge and experience of other companies, rather than having to build everything themselves. RingTail is building a similar package to link it to Attenex.

The net result is that a Relativity can allow for a linear review, a non linear review, or an early case assessment  in a single platform. This means that once data has been loaded into a review platform it can be culled, clustered, de-duped, near de-duped, and generally treated as an early case assessment tool.

Data can be loaded into Relativity and culled down using the methods of an ECA platform.  Then once the set of data has been chosen for review this can simply be released/tagged or otherwise identified for a full scale/detailed linear review. This can be done either by the vendor or the reviewers.

The beauty of this type of solution is that if there is an error in the culling process, either too much or too little data has been identified for linear review, then it can easily be untagged or re-tagged and moved between the ECA and Linear review phase.  The data is not moving between platforms, its not moving from a Nuix to a Ringtail, or a Recommind to an Introspect, but staying inside Relativity, this is very cheap and almost instant to do. The data is moving between phases, not platforms.

So, why the big issue about Early Case Assessment?

If linear review platforms can also conduct non-linear review, why the big buzz about early case assessment?

There are probably several reasons for this.

  • The Megan Fox effect. Megan Fox is a good looking woman, no doubt, and that may influence our opinion of her as an actress and the quality of her films. Is it any wonder advertisers use good looking men and women to sell products? We associate beauty with quality. Early Case Assessment tools generally look brilliant, with clean simple interfaces, easy to use, intuitive, etc. The new breed of linear review platforms with concept searching may have all the functionality of an ECA tool, but it’s not immediately obvious, those skills are hidden.
  • Pricing: Traditionally the linear review tools have had a high price and been seen to be expensive to review in time and money for lawyers. Early Case Assessment tools have a different pricing model, and their aim is not to make a lawyer review everything, but just an initial look. This is not necessarily cheaper than processing all of the data, putting it into a review platform and then seamlessly moving between linear and non-linear review as needed, but on first look it does appear cheaper.
  • Processing Myth: Processing is hard work, period! Even processing 1,000 files will generate numerous errors. Then there is the loading of the data into a review platform, e.g. moving from Discovery Cracker to RingTail, or LAW to iConnect.  This also causes a degree of effort, time, pain, and therefore costs. Early Case Assessment tools offer a simple solution; you just load the data into the ECA tool/platform and then review it, and then “process” it later, but only process the data you need to review in detail. Voila, there is a cost and time saving already, quick get a an ECA….err….not quite. If data can be searched, filtering, clustered, etc – then it has been processed. The ECA tool has processed the data, in almost the same way the traditional processing tools have – extracted the metadata and text, for searching and filtering. The ECA tools tend not to show the errors as much, they make it simpler to use, it’s more of a black box. The value of that versus the risk, will depend on the client, vendor, and case.  The lack of processing appears to be a clever branding/pricing trick.  The main exception to this is Nuix which has managed to process files in a slightly different manner, making it genuinely faster than other processing tools on the market, but that is the exception rather than the rule.

In short, what can be achieved through an Early Case Assessment tool can be achieved through a good quality review platform, assuming it has all the bells and whistles and those using it know how to use, and how it needs to be used for specific case.

This does not mean that dedicated early case assessment tools do not have a place in the market; they will no doubt grow particularly well in internal market, for corporate, where they are trying to get a hand and what they have and what they need.

It’s quite probable that review platforms and ECA platforms will merge into each others’ market space, with ECA tools adding complete linear review capability and review platforms adding complete processing capability.

The vendors will, of course, need a new name for these tools, as well as new look, and new pricing structure.

Electronic Discovery: Technology and the Market

What is the future of electronic discovery technology?

The recession/credit crunch/depression is apparently easing, with much of Europe reporting growth.  But the e-discovery market is not likely to pick up suddenly. There are also predictions, in a recent ABA report, of mergers and retreats.

If this is right existing companies are going to be under pressure to keep costs down and new companies are going to have to take a very careful look at entering the market.  Salaries could also take hit, as companies are forced to keep prices low. Overall the market is probably to have to be more effective, leaner, and more efficient overall.

Companies like i-Lit and AllVision[1] which don’t conduct electronic discovery processing and are independent of vendors may be ideally placed in the market if it changes as some suggest.  Their business model is to advise  on the best method to approach ED projects and can assist in the management of e-discovery and help keep control of costs.

Technology

But what about the technology? Is the ED technology going to change? It’s impossible to tell what will happen, but we can make guesses about technology based on market forces and current trends.

Culling Strategies

Effective culling strategies (for effective read cheap) will probably be a big thing.  They will have to be deployed earlier on and more consistently. This has started already and will continue to do so. Concept Searching and near de-duping are increasingly common, with most major vendors now offering concept searching, and there are numerous concept searching tools on the market.

Early Case Assessment and More Culling

The idea of just ploughing through 100,000s of junk data is already long outdated; hence keyword searching and non-linear review tools are so popular. But more culling can, and will, be done to reduce the irrelevant data going into review: Concept searching tools are great but they are not free.

More effective selection of data sets, file types, and custodians will have to be considered. Is there any point in loading large numbers of large power point files into a review if there is no evidence that they are relevant, following a sampled review? Early Case Assessment tools and technologies will allow for more effective and better planned culling strategies.

Backup tapes are not going to go away, case law in the UK has seen to this, but the costs will have to come down; even though they are already lower than many people think. Tools like IndexEngines , which have not made any great impact in the UK yet, can offer a great way to cull huge numbers of tapes at relatively low cost.

Processing

Processing data is, in short, a bitch. There is no nice way to say. Most tools are buggy, have flaws, and huge error logs, or no error logs at all (the latter is more worrying). Then there is the compatibility issue.

Data may be collected by FTK, extracted with EnCase, processed and filtered by Discovery Cracker, near de-duped with Equivio, concept searched with ContentAnalyst and reviewed in RingTail.

You don’t need to be involved in the e-discovery industry to see that there is a lot of moving parts involved in collecting and processing data.  In the example given 6 different tools are being used, which means staff need to be trained on six different platforms, and the skills are not all overlapping. E.g. the skills needed for EnCase are entirely different to those for managing a review platform.

Keeping people trained on so much technology is difficult, expensive and time consuming. Its time consuming for the employees and their employers. Some firms choose to separate the skill sets amongst their staff; with separate collection, processing and review management teams. This is no cheaper, in fact it’s more expensive for a smaller company, but it’s very scalable and allows the teams to focus on their different areas.

But with separate tools and separate teams there are going to be problems. Getting data from an ED processing tool into a review platform is fraught with problems. Just moving data between tools can have issues.

Technology Consolidation

The report in the ABA Journal, which started this article, talked about merging and consolidation of companies, for economic reasons. It also makes sense, both in terms of economics and day to day e-discovery work,  for technology to be merged. Why have 6 tools when 4 or 3 can do it?

Relativity a popular and effective review platform has already taken this approach. It allows for plugins to be created for its technology. There are already  plugins for ContentAnalyst and Equivio, which allow them to work seamlessly with the review platform. This radically reduces the work load for those in the ediscovery companies, and increases the functionality of Relativity. No doubt there will be more plugins in the future, adding more capability and reducing the work load for e-discovery teams further.

Orange Legal Technologies –  is another company which has consolidated several tools, but they have taken it a step further. It has combined processing, concept searching, and review platform technology into one product.

This means, as it’s been built from the ground up, it should be able to move data from processing to review with few problems of the problems that those with separate systems have. If Orange Legal Tech works as well as it should do (the author has not used the tool), this will remove many of the pains that those at the sharp end of electronic discovery feel: Keeping track of files, dealing with compound file issues, having a centralized interface, centralized logging of actions and errors.

Resolving these problems is a huge time and cost saver.

Technically Attenex took this approach several years ago, and it was a success. They combined WorkBench (processing), DocMapper (review) and MatterManager (reporting) into one tool. The problem? Attenex is staggeringly expensive.

If Orange Legal Technologies get their pricing model right, and the tool deliver what they say it does, then this could well be a new market leader.


[1] Both are UK based,

Electronic Discovery: What is DeDuplication

What is de-duplication?

 De-duplication is, as the name implies, the removal of duplicate files from a set of data. However, unlike the name, this is not as straight forward as it sounds.

Why is De-Duplication Conducted?

Within a company’s data set there are many duplicate files, e.g. one file on a desktop, another on a file server, a third in an email. If backup tapes are used, every backup tape will potentially have a huge number of duplicates.

Example: A data set consists of one file server, which has 1,000,000 files on it and a single backup tape containing 900,000 files. While there is a total data set of 1,900,000 files, it may be that 900,000 on the file are duplicates from the 1,000,000, as the backup tape was taken from the server. Within the 1 million files on the server there may be 20% duplicates, i.e. there are only 800,000 unique files, rather than 1.9 file, and there is no point in looking at the additional identical 1.1 million files.  Without the de-duplication the cost of reviewing the data would (in this theoretical example) more than double.

 

E-Files

E-Files are the easiest of type of file to de-duplicate. The tools involved in this process conduct a “simple” mathematical process, known as a “hash” (usually an MD5 or sometimes a SHA-1). A has is unique to a file, if the file changes at all hash will change. If single full stop is added to a 700 page document, the hash number will be completely different. Therefore if two files have the same hash value, then they are the same and the duplicate values can be removed, so that the investigator/reviewer does not need to see this document.

 

E-mails

Emails are not as simple to de-duplicate as one may hope for. The reasons for this include:

  • Person A in Company A sends an email to Person B, in company B. The email sent and the email received are the same email, the data has not changed, and therefore they are duplicates however the email in the sent items of A is a physically different files to that in the Inbox of B.  Once the reasons for this is that the outgoing message does not have a message header, the incoming one does. Once the messages are taken out of their mail boxes and hashed, they will be different as the messages files are different, though the email content is the same.
  • The same is true for messages that have been copied to other people.
  • Emails are sent from an Outlook Email box to an OutlookExpress mail box, one stores the emails as a MSG the other an EML. Therefore the files, while containing the same message are clearly physically different. As a result they will not have the same hash value.

Due to the problems with de-duplication of emails a different approach needs to be taken. One approach is to hash all of the different parts of an email, the date, author, recipients, message body, etc and then combine these hash values together to create a new value. It is this new hash value that is used to measure if an email is unique or not.

 This way if the emails are “the same” they can be de-duped even if they are not identical files, in the computer forensics sense.

 

Attachments

File attachments to an email can cause debate amongst people. The aim of a review platform is to ensure that a client can review all of the data they need to without duplicating the work, but with attachments this starts to become a bit of a grey area.

If two emails are the same, they are treated as duplicates and removed. If two e-files are identical, they are de-duplicated. But what if an e-file is loose in a folder but is also attached to an email elsewhere?  Are both files shown, isn’t that a duplicate? If you remove the file attached to the email then you have broken the “family” of documents.

What about if there are two emails, which are different but have the same attachment. Are both those files put in for review? If they are then work effort is duplicated, and people can mark one file as relevant and another file as non relevant. Equally if they are not both brought through to review, it means breaking up the family of documents.

Another option would be to treat them as separate files, but bring their hash values through to the review platform and allow the review platform to recognize that they are duplicates. This way, one file is marked as relevant the other, on a different attachment will also be marked as relevant.

But this seemingly obvious solution presents a problem, that is the issue of marking families, e..g if you mark an email as confidential/privileged does this mean the rest of the family is? Probably. Therefore if you mark the attach as confidential in one location, because of the email its attached to, it would be also privileged in another attachment, where it could may not be privileged, but relevant and should be disclosed.

These problems may seem unlikely, and there is a low probably of any one of these complex scenarios occurring. But, with the document sets in the millions or tens of millions, the even if there is 1 in 10,000 chance of this occurring there will still be 100 or 1000 different situations where this will occur in one case, let alone multiple cases. De-duplication – not as simple as name implies.

Electronic Discovery: What is a review platform?

What is a review platform?

A review platform is a method of allowing people from within an office, or around the world, to view, search and identify  documents as part of an electronic discovery case.

What goes into a review platform?

Data that has been collected, culled and processed is loaded into a review platform, which allows the legal teams or investigators to review the data. The interface for the most review platforms is a web browser; with the  most well known exception being Attenex, which uses Citrix, due to some of the more advanced functionality of the interface.

What does a review platform do?

A review platform can host all of the documents for a case, and allows some or all  of the following functions:

  1. Keyword searching of documents
  2. Filtering, de-duplication, and ordering documents by dates, file types, location, etc.
  3. Marking of documents for relevance/privilege
  4. Redaction of documents
  5. Document management for disclosure. Tracking the speed and quality of the review, and production for courts

Some review platforms have the ability to conduct near-duplication, find conversations threads for of emails, and concept searching, though the traditional “linear” review is simply based on reviewers going through one document at a time.

There are a wide variety of products on the market, but they all serve the same basic functions highlighted above. What distinguishes one product from another is the ease of interface and the ability for a law firm to manage their review. E.g can searches be saved? Is it clear which document has been marked relevant or not relevant, and by who? How are families of documents kept together? Can you redact a document easily? How are spreadsheets handled? Does it have its own viewer? Can you print out documents required? Is tiffing conducted on the fly or separately?

On face value all review platforms may appear the same, but the devil really is in the detail.

Are review platforms secure?

Review platforms are built to be secure, with a variety of security policies that are available.  For example, in addition to a user name and password required for access, the sites access can be restricted to IP address; this means that only certain offices can view the documents. Some review platforms even allow security to be increased by the use of RSA dongles. Its impossible to make anything online 100% secure, but there have been no reported case of review platforms being hacked (and there must be a lot of incentive to do so), so something must be working.

How much do review platforms cost?

In the UK there are generally  two options available for companies who are keen to have their data on a review platform, they can buy a review platform and host the data themselves or they can ask a vendor to do this. The first option is expensive, often running into $100,000s once all the servers, training, and rack space has been allowed for.

This option is normally only taken by large law firms. The second option is to ask a vendor to host the data.  The prices just for hosting vary from one company to a next, and the review platform, but there are normally costs per GB per month.  The costs can vary from tens to over a hundred pounds per month per gigabyte.  Its also not unusal for companies to charge loading fees, which are also based on a gigabyte fee.


Electronic Discovery: What is Electronic Discovery?

Electronic discovery is the process of collecting, culling, and reviewing data by lawyers, so that relevant documents relevant to a case can be exchanged between the parties.

The Typical UK Case

Typically there is a case between Party A and Party B; the two parties are in litigation and are required by law to exchange documents relevant to each others cases. To do this the parties involved collect their data and then read or “review” their data. Each party will normally employ a law firm to conduct the review, for their independance and skills.

Individuals, or teams of lawyers (depending on the amount of data), will look at the documents for their party and decide if the documents are relevant (responsive) or not. Once the parties, and their respective law firms, have reviewed the documents and decided what is and is not relevant, each will produce a bundle of relevant documents and exchange them with the other side.

The entire legal process, from deciding what documents to search/review, to what is relevant and how the documents are exchanged is governed by the Civil Procedure Rules Part 31.

The World of Paper

Traditionally lawyers would collect the companies paper documents, sort them, and then review them.  Putting the documents in piles of relevant, non-relevant, and confidential. The ones that were identified confidential would be photocopied and the critical passages redacted (blacked out) creating more paper. Once all the relevant files were identified these would be photocopied again and passed exchange with the other side. This task of reviewing data was a huge process, and very manual and prone to errors.

The rise of email

When electronic documents first came into existence the problem did not get any easier as these electronic documents would also be printed out so they could be reviewed as well.

As memos, accounts books, FAXs and reports have given way to documents, spreadsheets, emails, and PDFs, the idea of printing large volumes of electronic data became more and more frustrating. In addition to this the amount of documents available to a reviewer grew, as there was no need for disposing of bulky paper on a regular basis.

Example

If an individual writes  50 emails a week, and recieve a 50 emails each week, it is fair to say that the invididual has 100 emails in his mail box in 1 week, and 400 in a 4 week month. Therefore, assuming the rate continues and there is no deletion, there would be 4000 email in the individual mail box in ten months (assuming 4 weeks per month).

Backups

With development of backup technology the number of document available became far greater than ever before. If the user mentioned above has a complete back up of his mail box every month then  number of emails available will be truly huge. In fact if all of the backup tapes were extracted the total data available would be 26,000 emails, rather than 4,000. See calculations below

Data on Backup
Month 1 400
Month 2 800
Month 3 1200
Month 4 1600
Month 5 2000
Month 6 2400
Month 7 2800
Month 8 3200
Month 9 3600
Month 10 4000
Total on Backup 22000
Total on Live Email 4000
Total Emails Available 26000

For this reason the volumes of data increased massively and effectively prevented a manul review of data. This resulted in need for electronic discovery.

(the example give is purely to assist in calculations in in this scenario processing the backup data would be redudant as everything is on the live mail box)

Paper Discovery is Dead Long Live Electronic Discovery

Electronic discovery is, simply put, highly detailed document management. It collates all of the documents together, into a single location, known as a “Review Platform”, and then allows one, or many, people to review the documents simultaneously. As the data is held electronically files can be filtered, searched, and culled down in numerous different ways, to allow a rapid review of the documents. In addition to this the documents, and actions on the documents e.g. who reviewed what, what searches, who looked at it, etc, can be fully audited. Production and exchange of of relevant documents, can now be done all electronically.

The benefits of reviewing electronically, rather than by paper are huge. The automatic searching and filtering of documents, working collaboratively from anywhere on the world (most review platforms are online), moving data between teams and redacting without the need for photocopying.

Culling: Filtering,, Searching and De-Duplication.

The subject of filtering, searching, and de-duplication is a huge area, and could easily keep a couple of PhD students studyingfor a while. But, in brief electronic discovery can allow for the following:

  • Filtering: Electronic Documents can be quickly filtering by date. Documents in (our out of a date range) can be brought in, or out, or review. Allowing the reviewers to only focus on what is relevant. On the average PC or Server there are thousands of junk file, DDL, EXE, .HLP file, none of which are relevant to the review – these can be filtered out of the data set quickly and easily.
  • Searching: If the review team know what they are looking for they can use keywords to rapidly remove all of the documents without the keywords, and focus the search.
  • De-Duplication: When working with servers and backup tapes there is likely to be a very high number of duplicates – two documents that are the same but stored in different locations. By using de-duplication methods the reviewers only need to see one copy of a file, radically reducing the amount of data to review.

By combing the culling and review platfoms the process of discovery is now feasible in the modern world, compared to old fashion method of reviewing by paper.

Electronic Discovery: Reviewing UK data from outside the EU

If data is processed and hosted in the UK, can it be reviewed from outside of the UK? How does the ICO view this? Does the Data Protection Act allow for review of data from outside of the EU?

Review platforms, such as Attenex, Relativity, RingTail, or IConect, allow for reviewers to plough through very large amounts of documents usually via a web browser. The reviewers can be anywhere in the world, as long as they have access to the internet.  E.g. a Manchester case, with the data hosted in London, can be reviewed by a law firm in Bristol. This example, of 3 UK cities, does not pose any legal problems. However what if the review is to be conducted outside of the UK? E.g if the data to be reviewed is from the UK, is processed and hosted in the UK, but reviewed by a  New York law firm, what does the law state about this?

The UK legislation says both a lot and very little about the subject.

The Data Protection Act has 8 core principles, it is the eighth principle which is most relevant in this case.   This principle states that ““Personal data shall not be transferred to a country or territory outside the European Economic Area unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data.”

This means that data cannot be transferred out of the EEA, without permission of the custodian/person whose data it is, a safe harbor agreement, consent of the EU, or other acceptable EU security measure.

What does this mean for a reviewers? Is the data “transferred” out of the EEA durign a review? The term transfer is not described in legislation. Tools like Relativity, can prevent the physical of native documents, and only allow a “review” of the text or image (TIFF/PDF), this would imply that data was not transferred from the UK to the reviewers country (in this example the US), as it has not left anywhere. Also the act does not count transit as transfer.

However the ICO takes a different view. The ICO’s opinon on 2nd March 2008, and previously implied by the ICO, is that reviewing data from the the US (or any third party country) would effectively come under the eighth principle, as it is a transfer of data under the meaning of the act.

This taken alone would imply that reviewing data from a third party country, outside of the EEA would be an offence, which the ICO could prosecute for. With the ICO gradually gaining  more powers to protect data and privacy  in the UK, and pushing for more powers, the threat of a fine to law firms and data processesors has to be taken seriously.

However the ICO has stated that this problem can be resolved by a contract with the third party reviewing the data. For example if Company A was hosting data from Company B and Law Firm C, based in the US, wanted to review the data a contract between Company B and Law Firm C, guaranteeing the protection of the data, and suitable IT security by Company B and Law Firm C, should resolve the problem, and prevent any breach of the eighth principle.

Legal advice from an independent law firm and the ICO should be obtained in relation to transferring data outside of the UK. This article is provided for information purposes only and should not be construed as legal advice.