What’s the big deal about hashes?
This article follows on from the previous myths about “verification” in EnCase article.
Hashes are the back bone of computer forensics, they are used to identify and remove junk data with the NSRL/NIST list. They are used to de-duplicate files in computer forensics and electronic discovery, and they can also be used to de-dupe emails in electronic discovery. But, they also form the foundation of evidence security:
- “If the hash has not been changed the data cannot be changed”
- “The evidence cannot be tampered with because of the security of hash values”
- “The evidence is protected by the hash values”
These are just some of the common statements about hash values made by many, if not all forensic investigators, to clients and courts alike.
But is this true?
Subjects such as hash collisions are not relevant for this article and other, far more gifted writers, have already ably demonstrated that, for the purposes of computer forensics, the MD5 and SHA-1 hash are mathematically secure.
It is not the issue of the mathematics and cryptography that is being debated here, but rather the protocol. Can a person stand up in court and state that:
“The evidence cannot been tampered with because the hash has not been changed”
The imaging process
Very briefly this is what happens to evidence drive during the imaging process, for both criminal and civil offences:
- A suspect’s hard drive is connect to a computer (a hardware write blocker is normally used, but systems like Linux imaging platforms and software blockers can be used with or without hardware write blockers).
- A hash value is calculated for the image
- The hard drive is returned
For the criminal case the hard drive will be returned to a safe/evidence room, until it is required. During the case the defence, in theory, can ask to see the original and check the hash value.
For the civil case the hard drive will normally be returned to the user and, more often than not, the computer will be immediately booted, as the custodian/user/client needs to start working straight away.
This will immediately change the state of the disk; therefore the only evidence of the original hash value is the image. The civil investigator, if challenged, can refer to the original hash value to show the data has not changed since he took the image.
Different Data – Same Hash Value
In both of these scenarios there is one rather obvious problem, the evidence can be tampered with, and the hash remain unchanged. It just depends on when the data is tampered with.
People debate the pros and cons of hash values and protecting media: SHA-1 v MD5, the software write blockers V hardware write blockers, Linux v Windows, ICS v Tableau, etc, but they never debate the far more common scenario – bad people.
Note: Before going any further it should be made clear that there is no allegation that this has happened, it is merely a theory.
Let’s take those scenarios again:
- Person (A) gets takes possession of a hard drive of a suspect and then connects it to their forensic computer.
- Person (A) then images the data
- The computer calculates the hash value = ABC123
- Person (A) provides evidence to court that the image at the time had the hash value ABC123 and has the same value now.
But, what if Person (A) was corrupt? What if Person (A) wanted to frame the suspect?
What if the sequence of events occurred as follows:
- Person (A) gets hold of a hard drive or a suspect and then connects it to their computer
- Person (A) runs a script that dumps illicit material into unallocated space.
- Person (A) then images the data
- The computer calculates the hash value = ABD123
- Person (A) provides evidence to court that the image at the time had the hash value ABD123 and has the same value now.
Item 1.1 would take minutes, if not seconds to run, and would be undetectable to the naked eye. Once the data was on the computer it could be impossible to prove that is had been added deliberately.
There would be no need for the procedure to change the dates, this means that it’s entirely possible to insert data and just lie about it. The hash value of the image will not change and for a criminal case the hash value of the hard drive in storage would be the same as the image – because the illicit material was added before the hash value was calculated.
A physical example of this could be a crooked cop planting drugs on an suspect, the drugs would then be found (following the search) and put in sealed bag, and if the bag was opened later and tested it would be found to be drugs. The evidence of the seals would be used to prove that the drugs had not been tampered with. But this does not make any difference, the drugs were planted, the seals don’t help the person who has been set up.
Could it happen?
The scenario given has never been reported, there has never been a report of the police or a civil investigator inserting evidence onto a hard drive, nor is there reason to believe it has occurred and been unreported. But could it happen? It is technically possible, but would people really do this?
Firstly the police have misused data many times in the UK; secondly people have lied on the stand, on more than one occasion – in relation to computer forensics. Thirdly, police officers have been convicted of all sorts of offences, from blackmail through to rape. Why? It’s not because the police are particularly corrupt, it’s just that they are selected from the public, and there are crooks in the public, therefore there will be some criminally minded individuals in the police, though much less than in the public as a whole. If you’re willing to commit rape and blackmail your probably willing to add a few 1s and 0s to a hard drive. During the 1970s and 1980s certain parts of the police (in the UK) had a bad reputation when the law was bent and broken, to get convictions of “the bad guys”, inevitably innocent people where caught up in this and innocent people were wrongly convicted. In fact it was such a problem new laws and procedures were brought into to try and combat this.
But could it really happen now?
But now, in the modern world could the police really fake evidence? Sadly yes, it could still happen. The most obvious example of this is the fingerprint case involving the Scottish police. In this case numerous fingerprint officers in Scotland went to court and testified that the fingerprints they had obtained from a case proved that two people were guilty, one of murder the other of perjury: the latter was a fellow police officer. But, on appeal the worlds fingerprint officers stated that clearly both fingerprints did not belong to the two people in jail – in short the Scottish police were accused of framing people by lying about fingerprints.
Fingerprints are the closest analogy in the physical world there is to a hash value, and here were criminal investigators lying about fingerprints. Is it impossible to believe that a person would lie about a hash value, or plant data on a person?
Put it another way, given the nature of human beings – which is more likely:
(a) A hash collision or
(b) A forensic investigator doctors evidence to get the result they require, given that the latter has happened throughout human history and the former has never happened in anything other than a maths paper.
I believe the answer is in the question!
How can forensic investigators combat this? Firstly the concern about hash value security needs to be replaced with concern about procedures and processes that show that a person is being honest. Possibly the best way forward is to follow the example of the traffic police.
The traffic police have their gadgets, lots of gadgets: Radar, lasers, speedometer, etc, all of which can be used to catch us (me) going slightly (a lot) faster than we should do. But, what do they say when they pull you over?
They say that “We observed you travelling at excess speed because [description of the car and movement], and we can provide additional evidence of this through this device [laser, radar,etc].”
They could have pointed the radar gun at a motorbike doing 110mph and said it was you ; but it is their word in court.
The primary evidence is their word, supported by technology.
Therefore, when in court, the primary evidence that the data has not changed should be the word of the person presenting the evidence, supported hash values, and not the other way around.
It is the word of the investigator that we are all counting on, from the data collection through to the final analysis; investigators should not shy away from this.