Many file types, such as HTML or Word Documents have codes within them that make them unique. These codes are not the file extension (.HTML or .DOC) but rather a code within the actual documents.
These codes are known as a “signature” and are used routinely in computer forensics and electronic discovery.
Example1: The first few characters a GIF file are 47 49 46 in hex, which is GIF in ASCII.
Example2: The first few characters a Word document appear, in Hex, to be “D0 CF 11 E0 A1 B1 1A E1”
Example3: The first few characters of a JPEG appear as, in hex, as “FF D9”
These code can, in theory, be used to identify the start of documents. Some file also have signatures at the end of the document as well as the begining. These signatures, or headers and footers can be used for several purposes; including file identification and file carving.
File Identification through signatures:
If a file is called system.dll and is in the systems folder people may well assume it is in fact a systems file. But, if the first few characters are “D0 CF 11 E0 A1 B1 1A E1” then it would imply that this was a Word Document. Simply changing a file exentions from .DOC to .DLL and placing the document in system folder would do this.
Why would anyone do this? To hide a file.
Computer Forensics tools can automatically detect signatures and have hundreds of different signatures in their database. They can automatically compare the the signature to the file extension and notice any difference; therefore highlighting files of concern.
This is something that could happen, though just from the experience and looking at file types it does not appear to happen often; and examples of this are generally confined to the computer forensics lab. Possibly because, its impractical for users to work with files that are hard to access.
This is a function that is used by both computer forensics and data recovery tools. When a file is deleted and it cannot be recovered easily (i.e the pointed to the file, e.g. the MFT, is no longer there), then file carving is the next best method of recovering the data.
File carving scans every sector of a hard drive (or a selected space) and looks for headers/signatures for files. When it finds one it will then “carve” the data out until it reaches a footer, or the next header. This means it literally starts copying the data from the first sector and every sector out after that, until it has a reason to stop.
The system is not always going to work, particularly for files that are partially overwritten or were fragmented prior to deletion. Some tools can put in a cut off point for “recovered” file sizes so that tens of gigabytes of data are not copied out of; due to fragmentation in files.