Forensics: Hashes, do they work?

What’s the big deal about hashes?

This article follows on from the previous myths about “verification” in EnCase article.

Hashes are the back bone of computer forensics, they are used to identify and remove junk data with the NSRL/NIST list. They are used to de-duplicate files in computer forensics and electronic discovery, and they can also be used to de-dupe emails in electronic discovery. But, they also form the foundation of evidence security:

  • “If the hash has not been changed the data cannot be changed”
  • “The evidence cannot be tampered with because of the security of hash values”
  • “The evidence is protected by the hash values”

These are just some of the common statements about hash values made by many, if not all forensic investigators, to clients and courts alike.

But is this true?

Subjects such as hash collisions are not relevant for this article and other, far more gifted writers, have already ably demonstrated that, for the purposes of computer forensics, the MD5 and SHA-1 hash are mathematically secure.

It is not the issue of the mathematics and cryptography that is being debated here, but rather the protocol. Can a person stand up in court and state that:

The evidence cannot been tampered with because the hash has not been changed

The imaging process

Very briefly this is what happens to evidence drive during the imaging process, for both criminal and civil offences:

  • A suspect’s hard drive is connect to a computer (a hardware write blocker is normally used, but systems like Linux imaging platforms and software blockers can be used with or without hardware write blockers).
  • A hash value is calculated for the image
  • The hard drive is returned

For the criminal case the hard drive will be returned to a safe/evidence room, until it is required. During the case the defence, in theory, can ask to see the original and check the hash value.

For the civil case the hard drive will normally be returned to the user and, more often than not, the computer will be immediately booted, as the custodian/user/client needs to start working straight away.

This will immediately change the state of the disk; therefore the only evidence of the original hash value is the image. The civil investigator, if challenged, can refer to the original hash value to show the data has not changed since he took the image.

Different Data – Same Hash Value

In both of these scenarios there is one rather obvious problem, the evidence can be tampered with, and the hash remain unchanged.  It just depends on when the data is tampered with.

People debate the pros and cons of hash values and protecting media: SHA-1 v MD5, the software write blockers V hardware write blockers, Linux v Windows, ICS v Tableau, etc, but they never debate the far more common scenario  – bad people.

Note: Before going any further it should be made clear that there is no allegation that this has happened, it is merely a theory.

Let’s take those scenarios again:

  1. Person (A) gets takes possession of a hard drive of a suspect and then connects it to  their forensic computer.
  2. Person (A) then images  the data
  3. The computer calculates the hash value  = ABC123
  4. Person (A) provides evidence to court that the image at the time had the hash value ABC123 and has the same value now.

But, what if Person (A) was corrupt? What if Person (A) wanted to frame the suspect?

What if the sequence of events occurred as follows:

  1. Person (A) gets hold of a hard drive or a suspect and then connects it to their computer
    1. Person (A) runs a script that dumps illicit material into unallocated space.
  2. Person (A) then images  the data
  3. The computer calculates the hash value  = ABD123
  4. Person (A) provides evidence to court that the image at the time had the hash value ABD123 and has the same value now.

Item  1.1 would take minutes, if not seconds to run, and would be undetectable to the naked eye. Once the data was on the computer it could be impossible to prove that is had been added deliberately.

There would be no need for the procedure to change the dates, this means that it’s entirely possible to insert data and just lie about it.  The hash value of the image will not change and for a criminal case the hash value of the hard drive in storage would be the same as the image – because the illicit material was added before the hash value was calculated.

A physical example of this could be a crooked cop planting drugs on an suspect, the drugs would then be found (following the search) and put in sealed bag, and if the bag was opened later and tested it would be found to be drugs.  The evidence of the seals would be used to prove that the drugs had not been tampered with.  But this does not make any difference, the drugs were planted, the seals don’t help the person who has been set up.

Could it happen?

The scenario given has never been reported, there has never been a report of the police or a civil investigator inserting evidence onto a hard drive, nor is there reason to believe it has occurred and been unreported.  But could it happen? It is technically possible, but would people really do this?

Firstly the police have misused data many times in the UK; secondly people have lied on the stand, on more than one occasion – in relation to computer forensics. Thirdly, police officers have been convicted of all sorts of offences, from blackmail through to rape. Why? It’s not because the police are particularly corrupt, it’s just that they are selected from the public, and there are crooks in the public, therefore there will be some criminally minded individuals in the police, though much less than in the public as a whole.  If you’re willing to commit rape and blackmail your probably willing to add a few 1s and 0s to a hard drive. During the 1970s and 1980s certain parts of the police (in the UK)  had a bad reputation when the law was bent and broken, to get convictions of “the bad guys”, inevitably innocent people where caught up in this and innocent people were wrongly convicted.  In fact it was such a problem new laws and procedures were brought into to try and combat this.

But could it really happen now?

But now, in the modern world could the police really fake evidence? Sadly yes, it could still happen. The most obvious example of this is the fingerprint case involving the Scottish police. In this case numerous fingerprint officers in Scotland went to court and testified that the fingerprints they had obtained from a case proved that two people were guilty, one of murder the other of perjury: the latter was a fellow police officer.  But, on appeal the worlds fingerprint officers stated that clearly both fingerprints did not belong to the two people in jail – in short the Scottish police were accused of framing people by lying about fingerprints.

Fingerprints are the closest analogy in the physical world there is to a hash value, and here were criminal investigators lying about fingerprints.  Is it impossible to believe that a person would lie about a hash value, or plant data on a person?

Put it another way, given the nature of human beings – which is more likely:

(a)    A hash collision or

(b)   A forensic investigator doctors evidence to get the result they require, given that the latter has happened throughout human history and the former has never happened  in anything other than a maths paper.

I believe the answer is in the question!


How can forensic investigators combat this? Firstly the concern about hash value security needs to be replaced with concern about procedures and processes that show that a person is being honest. Possibly the best way forward is to follow the example of the traffic police.

The traffic police have their gadgets, lots of gadgets:  Radar, lasers, speedometer, etc, all of which can be used to catch us (me) going slightly (a lot) faster than we should do. But, what do they say when they pull you over?

They say that “We observed you travelling at excess speed because [description of the car and movement], and we can provide additional evidence of this through this device [laser, radar,etc].”

They could have pointed the radar gun at a motorbike doing 110mph and said it was you ; but it is their word in court.

The primary evidence is their word, supported by technology.

Therefore, when in court, the primary evidence that the data has not changed should be the word of the person presenting the evidence, supported hash values, and not the other way around.

It is the word of the investigator that we are all counting on, from the data collection through to the final analysis; investigators should not shy away from this.


2 Responses to “Forensics: Hashes, do they work?”

  1. H. Carvey Says:

    A couple of things that are bit more than a matter of semantics:

    “…they also form the foundation of evidence security:”

    More correctly, hashes form the foundation of evidence integrity.

    “If the hash has not been changed the data cannot be changed”

    This one is up in the air. What most people don’t remember is that researchers who showed MD5 collisions were using specially-crafted data. From the fact that collisions were found, many in the forensics field now believe that the MD5 algorithm cannot be used to verify the integrity of data. I would suggest that this is incorrect…but if you really need a warm fuzzy, use both the MD5 and SHA-1 algorithms.

    “The evidence cannot be tampered with because of the security of hash values”

    Perhaps more correctly, one could state that if hashes remain the same, then there is an indication that the evidence had not been tampered with.

    “The evidence is protected by the hash values”

    Hashes don’t protect anything. They are used to verify the integrity of files and data.

    On another note, I see where you’re going with the scenario you presented (ie, illicit material added to unallocated space); however, many analysts are aware that having illicit material in unallocated space does not immediately mean that the individual is guilty of anything. Analysts will look for other indicators…for example, are there any indications of means by which the data could have been placed on the system in the first place (ie, web browser, P2P, digital camera, etc.)? Was there any indication that the user viewed the data, or attempted to delete/hide traces of viewing?

    My point is that there needs to be more evidence beyond illicit material in unallocated space to make the individual guilty.

    Returning to point, the difference between the two scenarios is that the hashes will not be the same….flipping a single bit in an arbitrary stream will result in a different MD5 hash. Adding illicit material to unallocated space changes many bits. Given the fact that a variable length input stream results in a fixed length hash, it is still mathematically improbable that two streams of the same length but different content will have the same hash.

    • 585 Says:


      Thanks for for the comment, my point has probably not come across as it should have, so I will try and clarify below:

      The comments below are just examples of some of the things people say, I am not saying they are true or not, just examples of common statements made in the industry:

      “If the hash has not been changed the data cannot be changed”
      “The evidence cannot be tampered with because of the security of hash values”

      In the theoritical example given data was added to unallocated space, but it could have been added to “My Documents”, or somewhere else. Unallocated is, of course, the easiest to add data to. It does depend on what evidence if being searched for and in what case. You are entirely right there is never always going to be a single file that 100% prove all cases, but it could prove some cases and it “could” be done. In cases where you are looking to show a chain of email communications adding like this is simply not going to work, and I am not suggesting it would.

      But, if the case was to prove the existance of a file on a computer then this method could work – again its a theory. I am not concerned by MD5 hash value collusions, as the probability of that is so staggeringly remote as to be not relevant to account for in procedures, e.g. it is probably more likely that the lab would be hit by an asteroid than have a hash collusion in a real scenario, but I have not prepared the lab for “Deep Impact 2” 🙂

      I agree with you that adding data will, of course, change the data and therefore the hash value, but that doesn’t matter – because how would anybody know what the original hash value was? It was never calcuated so how could anybody know it had changed. If you add data and then take the hash value, then “preserve” the evidence the hash value will not change from when it was calculated.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: