Electronic Discovery: Dates

Dates, Dates, Dates.

This is a subject that has been covered on this site on numerous occasions, and will be covered again. No apologies are made for this, because of the sheer importance of the subject.

Lawyers, vendors, and groups such as LiST, all  recognize the important of dates, but are the “right” dates being produced, reviewed, or understood?

Different people, different tools, different companies have different interpretations of what the right date is.  They can’t all be right. In fact if there are 4 differnet companies showing their opinion on the correct date, and there are 6 dates available, then at best at least 3 companies are going to be “wrong” at worst all 4 are wrong. Alternatively, as you will see below all four could be right.

Firstly we need to indentify the problem. There are lots of dates for a file, created date, modified date, accessed date, file system dates, metadata dates, printed dates, etc. Dates with strange names like “MFT entry dates” or “registry dates”. There are  even dates in other files that tell you about the file you’re looking at, but they are not included in the review. Then there is the whole discussion about what is meant by “created” or “modified”. Does “created” mean the first time the file was created, or the first time it was created on the custodians computer, or the first time it was received by the custodian?

A Common Division

Some companies chose to show a particular date in a review platform, they chose the “file system” dates for a document. We will call these companies, Company A.

This means that the created date for a file will be the day the file is “created” on that computer they are processing data from. For example a file is created on 1st Jan 2009, it is then emailed to a user on 31st July  2009 who then saves it on the computer the next day. Therefore the created date for that file is 1st August 2009. That information is stored in the file system, and it is this information that is pulled out and displayed in the review platform.  Therefore when a lawyer reviews the date of the document and looks for the “created” date of a file they will see 1st August 2009.

“Ahh, but wait, that’s wrong” cries Company B. Company B believe that the true date is the 1st January 2009, this is stored in the metadata of the word document. So Company B shows their clients 1st January 2009.

This means that for the same document two companies will show different dates for the same file

Company Date Created
Company A 1st August 2009
Company B 1st January 2009

They can’t both be “right” for a single definition of the created date. The difference here is just 6 months, but it could easily be years. If dates are being used as filtering criteria it is easy to see how critical files can be filtered out by date filtering

Company C

Company C steps in, and says “Look your both wrong. LiST states that you need to put in date shown on the document”. So, Company C manually opens the document and looks at the front of the word document, which is a report. The report states “Account Report 2008”. The coding is put down as 1/1/2008.

The dates available to the client in the different review platforms are now:

Company Date Created
Company A 1st August 2009
Company B 1st January 2009
Company C 1st January 2008

It’s the Email. Stupid

Company D, looking on sagely states, “Your clearly all wrong. The date you want is the date the file is emailed, and that was 31st July 2009. That’s the ‘created’ date because that is when the file was actually received, and that’s what the client wants to know”

Now, a client, if they chose to shop around has a variety of dates.

Company Date Created
Company A 1st August 2009
Company B 1st January 2009
Company C 1st January 2008
Company D 31st July 2009

Use ‘em all

Company E, has been watching this with hindsight, and decides that the best thing for the client is to show all of the dates and a selection of different modified dates as well. Just to cover all of the bases.

File Name File System  Created Date Metadata Created Date Email Date Coded Date File System Modified Date Metadata Modified Date
Word Document 1st August 2009 1st January 2009 31st July 2009 1st January 2008 28th August 2009 28th August 2009

As the word document file was spell checked by the custodian, on the computer being processed, on 28th August 2009, the file was last modified on this date.

Help!

If this array of dates is provided to a review team they are unlikely to understand the staggering amount of information available to them, or they may make false assumptions. For example files can be moved between drives (e.g .thumb drives and desktops) without the created date changing or sometimes the date can change. The modified date can be before the created date, that always causes questions.

Just providing a huge range of dates can be problematic, despite its attempts to be helpful. Equally, not providing these dates will mean that filtering, searching, and sorting will cause problems.

The problem is very quickly multiplied as the volumes of data increases. In the example shown just one file has been processed and 6 dates produced. If there are 200,000 files, that’s a total of 1.2 million dates. That’s a lot of dates.

Is there a solution?

There is no single solution. There is no review platform that has the perfect solution that the author is aware of; if somebody thinks there is please contact this site, and we will put up the article/information.

For every argument for a “correct” date, there is one against it. E.g. not every file has a metadata date; text files for instance don’t. Equally the file system date is easily altered and not always relevant.

A remedy put forward by some people is to produce all of the dates (there actually more than the just the 6 shown) consult with the client and then show the dates required and hide the rest. When the project scope changes or more detailed information is required about a given file those dates can then be shown as necessary. It’s not perfect, but it’s a method that can help with the quagmire of dates.

What does not help is insisting that there is one correct date, or that that one method is right. All dates are important, it just depends on what question is being asked.

Advertisements

Forensics: Signature Searches

Many file types, such as HTML or Word Documents have codes within them that make them unique. These codes are not the file extension (.HTML or .DOC) but rather a code within the actual documents.

These codes are known as a “signature” and are used routinely in computer forensics and electronic discovery.

Example1: The first few characters a GIF file are  47 49 46 in hex, which is GIF in ASCII.

Example2: The first few characters a Word document appear, in Hex, to be “D0 CF 11 E0 A1 B1 1A E1”

Example3: The first few characters of a JPEG appear as, in hex, as “FF D9”

These code can, in theory, be used to identify the start of documents. Some file also have signatures at the end of the document as well as the begining. These signatures, or headers and footers can be used for several purposes; including file identification and file carving.

File Identification through signatures:

If a file is called system.dll and is in the systems folder people may well assume it is in fact a systems file. But, if the first few characters are “D0 CF 11 E0 A1 B1 1A E1” then it would imply that this was a Word Document. Simply changing a file exentions from .DOC to .DLL and placing the document in system folder would do this.

Why would anyone do this? To hide a file.

Computer Forensics tools can automatically detect signatures and have hundreds of different signatures in their database. They can automatically compare the the signature to the file extension and notice any difference; therefore highlighting files of concern.

This is something that could happen, though just from the experience and looking at file types it does not appear to happen often; and examples of this are generally confined to the computer forensics lab. Possibly because, its impractical for users to work with files that are hard to access.

File Carving

This is a function that is used by both computer forensics and data recovery tools. When a file is deleted and it cannot be recovered easily (i.e the pointed to the file, e.g. the MFT, is no longer there), then file carving is the next best method of recovering the data.

File carving scans every sector of a hard drive (or a selected space) and looks for headers/signatures for files. When it finds one it will then “carve” the data out until it reaches a footer, or the next header. This means it literally starts copying the data from the first sector and every sector out after that, until it has a reason to stop.

The system is not always going to work, particularly for files that are partially overwritten or were fragmented prior to deletion.  Some tools can put in a cut off point for “recovered” file sizes so that tens of gigabytes of data are not copied out of; due to  fragmentation in files.

Is Electronic Discovery Wrong?

The Hypothesis: Simply put electronic discovery is wrong. It is not 100% accurate, far from it. It’s inherently inaccurate. It misses out lots of data. The legal system should find other, more accurate methods.

__________________________________________________________________________________

Out of the ten of thousands of files collected from any one computer only a fraction of those are going to be taken out of the computer for full ED processing.

Out of the small percent that do get processed tens, hundreds or possibly thousands will fail due to errors, corroption, encryption, or other problems. Eventually, after the initial cull and the ED processing, a small set of data will be loaded into a review platform.  Once the data is reviewed even greater cuts will be made with huge percentages of documents removed through keyword searching.

This means that out of the 10,000s of documents originally available only a few thousands or a couple of hundred, per computer, will even be given the opportunity to be reviewed by teams of lawyers.

This is where the errors really start.

The bulk of reviews are conducted not by partners or senior associates, but by junior staff.  Sometimes contractors brought in for this purpose.  These junior staff people are working quickly, under pressure, with a requirement to review their assigned documents as quickly as possible.   The subject matter will often be new to them, the review platform may also be new to them, they are junior, under pressure, possibly tired, and working for long hours.  We don’t let people drive trucks in these conditions, let alone make decisions on multi-million/multi- billion dollar cases.

Even a highly experienced litigator, working short hours  on  a subject they know will make errors. It’s just a statistical fact. The junior staff, working in the conditions described will make a lot more errors; and these errors will not be even. The type of error that a person makes on one day will not be the same on another day, and different people will make different errors. This makes the QC of the errors difficult.

Does this  means that electronic discovery is wrong? Is it fundamentally flawed, so much so that a better method should be used?

No; quite the reverse.

Electronic discovery has errors, but so do must other systems of evidence. Fingerprints are not, as many people think, 100% reliable, but require human interpretation.

Witness statements are staggeringly inaccurate. A person who has witnessed a shooting or a road accident, will be traumatized, upset, angry, and influenced by the police (even accidentally through leading questions “Did you see the suspect get into a green car?”).  The witness will have their own prejudices about what they saw, what they think they saw, and what they think happened.  Witness statements are, in short, not 100% reliable.

Even DNA, the trump card in any investigation has errors, in all levels. The science is sound but humans make mistakes.

Does this mean that DNA, fingerprints and witness statements should not be used because there can be errors? Of course not.

The issue is not that mistakes are made, but that they are understood and accounted for.

Is the electronic discovery process less accurate than DNA or fingerprints, certainly? But does that matter?  The issue of accuracy must also be understood in terms of what is being measured.

If a forensic scientist makes an incorrect decision with their fingerprint or DNA analysis than they can state that Person A killed Person B, rather than Person A did not kill Person B (and this has happened, on more than one occasion). This is a massive error.

If ED processing is conducted incorrectly then a junk document, Document A may be reviewed by a lawyer when it was supposed to left out, or a document that was supposed to be reviewed was not.

Incorrect ED processing is very different. It does not change an accounting spreadsheet from profit to a loss, it does not make perfectly legal business transactions into a multinational fraud. ED processing does not interpret the data.

An ED processing guru does not go to court and state “Due to the MD5 value of this document I have  concluded that the money was moved from the US, to Switzerland, then back to BVI, for the purposes of tax avoidance, defrauding the US government of 7.5% of the gross amount in tax, meaning that the suspect only paid 11.5% gross. The net cost to the US government is $3.5 million. With that money the suspect invested in property, making a net loss over the year, for the investment, but he still retains assets of $2.5 million…..”

No, an ED guru should go to court and say “I have provided the documents as best as I can, not 100% of documents, but the best sample I can,  and here is how….”

Everything else is up to the accountants, lawyers, and other expert witnesses. This does not demean what an ED person does, but rather allows them to put it into perspective what they are doing and asses the risk.

If they are collecting hundreds of millions of documents (which is entirely possible), and thousands are not processed that is reasonable, expected, and required.

Electronic Discovery has errors, but it does not make it wrong.

Forensics Cloners: Talon

The Forensic Talon cloner, by Logicube, has been available for serveral years now, and at the time of its release it was the fastest on the market.

The Pro’s:

  • Its fast. Very fast. It states it can get upto 4 GB a min, and it can. If both drives are good S-ATA drives you will get those speeds.
  • It has a really good logging functionality, and stores everything you need to on flash media.

    Talon Cloner

    Talon Cloner

  • It can conduct a “keyword search” across the drive during the imaging (don’t expect the functionality of EnCase or FTK).
  • It has a good level of functionality, with cloning, imaging, and wiping functionality, as well as MD5 of SHA hashing, and verification methods.
  • If can be used as a write blocker as well as a cloning/imaging device.
  • It checks to see if data is moving over the cables correctly.

The Cons

  • Its plastic and feels cheap. Its not, but it looks and feels it.
  • The logging is fantastic, but it if the flash media is not present it will not image as it cannot log. i.e. if you leave the  flash media out of the kit, by mistake, you may as well not have the cloner at all.
  • It images to FAT32 only, no NTFS capability, and file names are restricted to 8 characters.
  • If you open the device, while the external IDE cables are connected (easy and often done) it can damage the connectors and ruin the cloner
  • The verification methods, involving data lines, can some times produce false negatives. I.e. The system will sometimes state that there is an error with the imaging/cloning functionality, even though the image is good.

How do you destroy a hard drive?

The question of hard drive destruction is often raised, when people want to prevent access to their data, e.g. getting rid of old computers.

Questions include:

  • How do I destroy my hard drive? Will drilling it work?
  • Can I burn my hard drive, will that work ?
  • Can I put it in water?
  • How many times do I need to wipe a hard drive, to get rid of all data?

Often, the answers involve “the only thing that destroys a hard drive is thermite” or “wipe the drive 100 times, then grind it up into a fine dust and then melt the dust”.

These statements almost certainly come from those who have never been in a data recovery clean room, and certainly never worked in one.

Destroying data, on a hard drive, is relatively easy and can be done one of two ways:

1)      Wiping the entire hard drive. Just once. Not 3 or 32 or 320 times

2)      Destroying the platters. Once the platters are destroyed recovery is impossible.

The latter option can be achieved by a variety of ways, such as drilling the hard drive. In theory “somebody” could read the data around the holes, though no commercial company would ever do that. As the governments outsource their major data recovery work, to commerical companies, from the NASA Columbia disaster to international terrorist incidents if its very technical and very important it gets outsourced. Therefore who exactly “somebody” is, is unclear.

The idea that overwritten data, on a modern hard drive, can be recovered is just fanciful. Nobody has ever recovered data an overwritten modern drive, and nobody has said they can, it’s merely a theory, an old theory that was never tested or proved. However, when this theory was tested, it was not possible.

Remember wiping data is not formatting or deleting data. It is wiping every single sector on a hard dive.

In short there no evidence for recovering wiped data but there is evidence to showing wiped cannot be recovered.

Physical Methods that will not work to destroy data on a hard drive include:

  • Throwing it in the water (this does not do much)
  • Setting it on fire (the temperature is not going to be high enough at home)
  • Throwing it out of the window. Hard drives can take quite a bit of G force.  They are not heavy so the impact of the hard drive on the ground is not likely to destroy the platters.
  • Drive over the hard drive. A car, or even a tank, driving over a hard drive will do nothing, any more than they  would driving over a book. Unless the drive is actually flattened, the platters are not going to be destroyed.

Electronic Methods that will not work in destroying data are:

  • Deleting files
  • Formatting files
  • Shredding files/Wiping Files

The whole drive needs to be wiped, not just some of it. Nothing else can guarantee all data is gone.

Forensics: Disk and Disk Geometry Quiz: EnCE & CCE Practice

A new computer forensics quiz has been released, this one is based on disks and disk geometry.  The questions are designed to help people practice for their EnCE and CCE style theory exams.

Forensics: New Hardware Quiz: EnCE & CCE Practice

A new computer forensics quiz has been released, this one is based on hardware.  The questions are designed to help people practice for their EnCE and CCE style theory exams.

More quizzes, aimed at people revising, will be coming soon.