Forensics: Importing Hashsets into EnCase (Part 2)

Part 2, of the series. Part 1, of how to import hashes into EnCase is available here.


Forensics: Importing Hashsets into EnCase (Part 1)

Forensics: Importing Hashsets into EnCase (Part 1)

Forensics: Hashes, do they work?

What’s the big deal about hashes?

This article follows on from the previous myths about “verification” in EnCase article.

Hashes are the back bone of computer forensics, they are used to identify and remove junk data with the NSRL/NIST list. They are used to de-duplicate files in computer forensics and electronic discovery, and they can also be used to de-dupe emails in electronic discovery. But, they also form the foundation of evidence security:

  • “If the hash has not been changed the data cannot be changed”
  • “The evidence cannot be tampered with because of the security of hash values”
  • “The evidence is protected by the hash values”

These are just some of the common statements about hash values made by many, if not all forensic investigators, to clients and courts alike.

But is this true?

Subjects such as hash collisions are not relevant for this article and other, far more gifted writers, have already ably demonstrated that, for the purposes of computer forensics, the MD5 and SHA-1 hash are mathematically secure.

It is not the issue of the mathematics and cryptography that is being debated here, but rather the protocol. Can a person stand up in court and state that:

The evidence cannot been tampered with because the hash has not been changed

The imaging process

Very briefly this is what happens to evidence drive during the imaging process, for both criminal and civil offences:

  • A suspect’s hard drive is connect to a computer (a hardware write blocker is normally used, but systems like Linux imaging platforms and software blockers can be used with or without hardware write blockers).
  • A hash value is calculated for the image
  • The hard drive is returned

For the criminal case the hard drive will be returned to a safe/evidence room, until it is required. During the case the defence, in theory, can ask to see the original and check the hash value.

For the civil case the hard drive will normally be returned to the user and, more often than not, the computer will be immediately booted, as the custodian/user/client needs to start working straight away.

This will immediately change the state of the disk; therefore the only evidence of the original hash value is the image. The civil investigator, if challenged, can refer to the original hash value to show the data has not changed since he took the image.

Different Data – Same Hash Value

In both of these scenarios there is one rather obvious problem, the evidence can be tampered with, and the hash remain unchanged.  It just depends on when the data is tampered with.

People debate the pros and cons of hash values and protecting media: SHA-1 v MD5, the software write blockers V hardware write blockers, Linux v Windows, ICS v Tableau, etc, but they never debate the far more common scenario  – bad people.

Note: Before going any further it should be made clear that there is no allegation that this has happened, it is merely a theory.

Let’s take those scenarios again:

  1. Person (A) gets takes possession of a hard drive of a suspect and then connects it to  their forensic computer.
  2. Person (A) then images  the data
  3. The computer calculates the hash value  = ABC123
  4. Person (A) provides evidence to court that the image at the time had the hash value ABC123 and has the same value now.

But, what if Person (A) was corrupt? What if Person (A) wanted to frame the suspect?

What if the sequence of events occurred as follows:

  1. Person (A) gets hold of a hard drive or a suspect and then connects it to their computer
    1. Person (A) runs a script that dumps illicit material into unallocated space.
  2. Person (A) then images  the data
  3. The computer calculates the hash value  = ABD123
  4. Person (A) provides evidence to court that the image at the time had the hash value ABD123 and has the same value now.

Item  1.1 would take minutes, if not seconds to run, and would be undetectable to the naked eye. Once the data was on the computer it could be impossible to prove that is had been added deliberately.

There would be no need for the procedure to change the dates, this means that it’s entirely possible to insert data and just lie about it.  The hash value of the image will not change and for a criminal case the hash value of the hard drive in storage would be the same as the image – because the illicit material was added before the hash value was calculated.

A physical example of this could be a crooked cop planting drugs on an suspect, the drugs would then be found (following the search) and put in sealed bag, and if the bag was opened later and tested it would be found to be drugs.  The evidence of the seals would be used to prove that the drugs had not been tampered with.  But this does not make any difference, the drugs were planted, the seals don’t help the person who has been set up.

Could it happen?

The scenario given has never been reported, there has never been a report of the police or a civil investigator inserting evidence onto a hard drive, nor is there reason to believe it has occurred and been unreported.  But could it happen? It is technically possible, but would people really do this?

Firstly the police have misused data many times in the UK; secondly people have lied on the stand, on more than one occasion – in relation to computer forensics. Thirdly, police officers have been convicted of all sorts of offences, from blackmail through to rape. Why? It’s not because the police are particularly corrupt, it’s just that they are selected from the public, and there are crooks in the public, therefore there will be some criminally minded individuals in the police, though much less than in the public as a whole.  If you’re willing to commit rape and blackmail your probably willing to add a few 1s and 0s to a hard drive. During the 1970s and 1980s certain parts of the police (in the UK)  had a bad reputation when the law was bent and broken, to get convictions of “the bad guys”, inevitably innocent people where caught up in this and innocent people were wrongly convicted.  In fact it was such a problem new laws and procedures were brought into to try and combat this.

But could it really happen now?

But now, in the modern world could the police really fake evidence? Sadly yes, it could still happen. The most obvious example of this is the fingerprint case involving the Scottish police. In this case numerous fingerprint officers in Scotland went to court and testified that the fingerprints they had obtained from a case proved that two people were guilty, one of murder the other of perjury: the latter was a fellow police officer.  But, on appeal the worlds fingerprint officers stated that clearly both fingerprints did not belong to the two people in jail – in short the Scottish police were accused of framing people by lying about fingerprints.

Fingerprints are the closest analogy in the physical world there is to a hash value, and here were criminal investigators lying about fingerprints.  Is it impossible to believe that a person would lie about a hash value, or plant data on a person?

Put it another way, given the nature of human beings – which is more likely:

(a)    A hash collision or

(b)   A forensic investigator doctors evidence to get the result they require, given that the latter has happened throughout human history and the former has never happened  in anything other than a maths paper.

I believe the answer is in the question!


How can forensic investigators combat this? Firstly the concern about hash value security needs to be replaced with concern about procedures and processes that show that a person is being honest. Possibly the best way forward is to follow the example of the traffic police.

The traffic police have their gadgets, lots of gadgets:  Radar, lasers, speedometer, etc, all of which can be used to catch us (me) going slightly (a lot) faster than we should do. But, what do they say when they pull you over?

They say that “We observed you travelling at excess speed because [description of the car and movement], and we can provide additional evidence of this through this device [laser, radar,etc].”

They could have pointed the radar gun at a motorbike doing 110mph and said it was you ; but it is their word in court.

The primary evidence is their word, supported by technology.

Therefore, when in court, the primary evidence that the data has not changed should be the word of the person presenting the evidence, supported hash values, and not the other way around.

It is the word of the investigator that we are all counting on, from the data collection through to the final analysis; investigators should not shy away from this.

Forensic: EnCase Verification, MD5, and Other Myths

Encase is without doubt the most popular forensics tool on the market, however due to the name of one its features, it has also started one of the most common myths. Verification.

When EnCase completes an image it then conducts a “verification” and when it completes, it brings up a variety of hash values, and confirms that the data has “verified”. Excellent. Data verified…no not at all.

The EnCase verification does not check the original data, it check the destination data. This is an often misunderstood point, but one that can be critical.

A very simple test of this, for the doubting Thomas out there, is to simply disconnect the original drive while the verification is being carried out. The verification will still complete, successfully, despite the fact there is nothing to verify against. The reason is this:

The verification checks the image file, it verifies the integrity of the image files, an important process. It does not check if the data imaged is correct – a very important difference.

Example: Company X is at an clients site  imaging hard drives, they are using Tableau write blockers, connected to laptops and imaging to USB drives from a well known brand (inside the USB case is a 3.5 inch 500 GB S-ATA).  The drive to be imaged is an old 2.5 inch IDE drive.

The 2.5 inch drive, an old laptop drive, is taken out of the laptop and connected, via a 2.5 to 3.5 inch converter to the tableau write blocker, which is then connected, via USB, to the the laptop.

The person imaging selects the source drive, the 2.5 inch and sets the destination drive as the USB drive, this means that the data takes the following route.

1) It is read from the old, dusty, 2.5 inch hard drive.

2) It goes out the 2.5 inch pins, into the 3.5 inch converter.

3) From the 3.5 inch converter it goes along an IDE cable.

4) From the IDE cable it goes along to the Tablea write blocker.

5) The black box that converts the IDE to a USB.

6) The tableau then transmitts the data down a USB cable.

7 ) The USB cable connects to the laptop USB port.

8 ) The laptop USB port then connects to the motherboard.

9) The data is then transferred internally, and EnCase then “reads” the data.

10) Encase then “write the data” out and it travels along the mother board to another USB port.

11) From the USB port it goes down a USB cable to the USB drive.

12 ) The USB drive then converts to a 3.5 inch S-ATA drive.

13) The 3.5 inch S-ATA then write the data.

It is not until step (9) that Encase reads the data. It is that data that EnCase then writes, and then verifies what it has written. If it is feasible for an error to occur between 9 and 13, hence the need for the verification, it is also feasible, if not more so, that an error occurs between 1 and 9.

If the hard drive is not working correcty, or the cables are damaged, or the pins are not aligned correctly, or any of a host of other reasons then the hard drive will not image correctly. 99% of the time this error will be a very obvious error, e.g the hard drive will not spin up, or it cannot be seen – which is a good error to have, as it can be addressed.

Sometimes, very rarely but sometimes, the drive will image, but it will be producing junk data, or “skewed” data. While this is rare, it certainly does happen (unlike the theoretical problem of MD5 collisions). i.e. this is a real world problem, not just one confined to labs and mathematics papers.

In the worst case scenario this means that data will be imaged, Encase will read it, write it, and then verify it. The person conducting the image will then leave the scene and state, without intending to lie, that they have a 100% accurate image of the data.When in actual fact they have junk. This can, and does lead to all sorts of problems.

In one case the image of a single hard drive was taken at a “suspects” home, the image was verified and then taken back to the office.  The image was later investigated, from the investigation the examiner concluded that the user had wiped their drive with a tool that deliberately made a mess of the MFT.

What had actually happened is that the image of drive was poor, and much of the MFT was skewed during the imaging process, probably due to bad electronics/electrics somewhere in the imaging process. i.e. they had not taken a good image. But the person investigating the drive did not know/understand this and as a result produce a very detailed report explaining how the drive had been deliberately wiped to hide information.

The suspect/victim of this allegation was fortunate in that the computer was working (and shown to be working) prior to the image being taken and was working after the image being taken; this was, oddly, recorded by the person conducting the image. From this alone it was very obvious that the one and only drive in the computer could not have been wiped. But, in this case a long and detailed report, accusing the suspect/victim of wiping evidence  was submitted. While there was no evidence of the original allegations, the report stipulated, at great length, that the suspect had wiped their drive, and therefore conclusions could be drawn from that. The person writing the report was adamant that the image was correct, because it verified when he wrote the report. Even though he was hundreds of miles from the actual hard drive, the myth of EnCase Verification was so strong, that he believed that the verification guaranteed the quality of the data. A common belief.

A second image was taken, correctly, and the drive examined. From this it could be seen that there was no evidence of wiping, nor evidence of the original allegations. The  suspect/victims statement that that there computer was working were fully corroborated, and they were proved innocent.