There are numerous problems with tapes, they are mechanical, they can be old and strange, with no useful labels. Some tape formats are nearly impossible to get data out of, and even if it is possible its too expensive to bother with.
While all of this may be true in some cases, often its not. Below are the top myths relating to tape processing in the electronic discovery industry.
Tapes: Top Myths
- There are only one or two companies can handle backup tapes
- Tapes can only be processed by very specialized companies
- All backup tapes are very expensive to process
- Once the tape data is extracted its expensive to process that data
- Email backups are very hard to deal with
- You don’t know what data is on a tape until you extract it
There are only one or two companies can handle backup tapes
This myth, which still persists, is simply not true. KrollOntrack is the giant of the tape processing world for electronic discovery, and for a long time they had total market dominance of both the UK and the world. However, times change, technology moves on and now many more people have entered the market. Below is a sample of companies who process data:
Tapes can only be processed by very specialized companies
For a long time tapes could only be processed by very specialized companies, such as those listed above. However, new hardware and software solutions have been released and continue to evolve. These tools will eventually change the tape market for ever.
The biggest change to the market, probably this decade is IndexEngines.
IndexEngines offers the ability to support not only a whole host of different data formats, but also search the data while still on the tape [technically its using an index of the tape it creates off line, but the effect is the same]. This means that 10’s or 100’s of tapes can be searched, and only documents which are responsive (based on custodian, keyword, or date range) need to be extracted. It can even de-duplicate prior to extraction. Historically IndexEngines used to have a highly inflated pricing model, but this now appears to have changed this, making it a realistic option.
KrollOntrack also sell a fully licensed version of their tool PowerControls. This means that tape recovery, can be conducted outside of these niche companies. Quest, also sell a tape extraction tool.
Law firms would still be advised that they outsource their tape work directly to one of these niche companies, listed above, as they are getting a guaranteed service, often for a fixed price. However, forensics firms and IT companies can now get a foot hold in the market without years of R & D.
Tapes are very expensive to process
Standard tapes are no longer expensive, in fact they will often be a lot cheaper than any other form of data collection. The reason for this is that as the number of companies specializing in tapes and the amount of tape software increase it forces the price down.
The emphasis is on the word “Standard”. If the person in charge of the tapes starts using phrases like “tape robot”, “Enterprise Vault” or “Brick Level Backup” then the project quickly moves from standard into the non-standard category and can become expensive.
Tape processing can cost between $150 and $500 per tape for simple extraction, which given the amount of data involved is incredibly cheap. For example, if a law firm want to collect all of the data from a companies server they could pay for a forensics team to travel to the location spend hours or even days collection the data, and then return, or they could simply ask for the last back up tape to be sent to them and then extracted. This simply action will take the data collection from $1,000s to $100s.
It can often be said the time it takes to get a decision to process a tape will cost more money than the cost of actually processing a tape, something worth considering for lawyers.
Once the tape data is extracted its expensive to process the data
This is not quite a myth, though it should be. Some companies will charge a fortune to process the data from the tapes, but many will not and there is no reason to. Example: If the companies entire Exchange Server is backed up onto on tape and its 100 GB in size, and the law firm is trying to review the mail box of just the CEO some companies might try and charge a per GB charge to process to process all the data, i.e. if they charge $500 per GB they may try and charge $50,000 to fully process the tape and load the data into the review platform. If this happen, change vendors!
The mailbox of a particular custodian can be extracted out from the Exchange Server using software priced at $500, called Paraben Nemex (and that’s the cost of a perpetual license). Once the CEO’s data has been extracted, then the company only needs to apply their per GB charge to the mailbox, which could be around 1 to 2 GB, so the cost is $500 to $1,000.
There are other ways to easily cull the data, before processing for review. For example; filtering by date, filtering by custodian, or filtering by file type. Once again, if a vendor is trying to get a law firm to process all of the data on the tape, without good reason, questions need to be asked.
Email backups are very hard to deal with
Most emails are stored in Microsoft Exchange Database format, known as an EDB file. Getting an EDB file out of a tape can be tricky, but there are companies and tools for this (see above). But once an EDB file has been extracted, it can be processed with a variety of tools, including:
- Paraben NemEx
You don’t know what data is on a tape until you extract it:
“Tapes are a black hole of data, you never know what’s on them.”
A common belief, but no longer true.
If a law firm has a box of 200 tapes with no labels, no accurate labels, or no useful labels, e.g. “Week1 MDB1”, then they could put all those tapes in the “too expensive, too time consuming, and too problematic” box and never deal with them again. Which appears to have happened in the Hammond case.
Tapes always have dates in them, if not always on them. A professional tape company can scan a tape and say when the tape was formatted, when data was put on their, and the general nature of the data, e.g. Email backup, file back, SQL backup, operating system backup, etc. This information, if used correctly, allows a law firm to plan a course of action.
The law firm with 200 tapes, may be looking for email data between 2007 and 2008. They can then send them off to a tape firm, who can then, very quickly look at the tapes and report back information like this
- 50 tapes are SQL backups
- 10 tapes are operating system back ups
- 90 tapes are file server backups
- 50 tapes of email backups
Out of the 50 email backups the following break down by date:
- 5 are pre-2007
- 15 are between 2007 and 2008.
- 30 are post 2008
From this the law firm can quickly see that only 15 tapes, out of the 200 are relevant. Further more, if the law firm are only after the mailboxes of the CEO, CFO, and COO, they can ask that only the mailboxes for this individuals are produced. If the mail boxes are 1.5 GB on average, that is just 67.5 GB of data.
This means that the law firm has reduced 200 tapes which contained 40,000 GB of data (assuming 200 GB per tape), to just 67.5 GB of data, for a costs in the thousands, rather than hundreds of thousands of millions