The following question originally arose over on another list:
"Open source tools such as DCFLdd v1.3.4-1 can usually recover all
data, with exception of the physically damaged sectors. (It is
important that DCFLdd v1.3.4-1 be installed on a FreeBSD operating
system. Studies have shown that the same program installed on a Linux
system produces extra 'bad sectors', resulting in the loss of
information that is actually available.)"
It has since been picked up by a second list that is not public, and
since I think that the question is of general interest, I will repost
my reply to the second list here so that it may benefit a broader
audience:
There is a related discussion over on the sleuthkit-users mailing
list:
http://www.nabble.com/Best-practices-in-dealing-with-bad-blocks-and-
hashes.-td16594673.html.
I can't speak specifically to DCFLdd; however, I do have some
familiarity with DD in general. :-) GNU DD reads data in blocks of a
specified size. If an error occurs while reading from a block and
the 'noerror' conversion was specified, DD skips the whole block and
a fill pattern (usually zeroes) is written to the output in place of
the missing data. If certain data cannot be read from a drive then
by definition you lose that data.
Disk drives are block devices. You read data from a block device in
blocks; either you get the whole block or you get nothing. For disk
devices the block size is equal to the sector size. Historically,
most disk devices have used a block size of 512 bytes. Some high
capacity drives now use a block size of 4096 bytes.
As long as you use a DD block size that is equal to the device sector
size (e.g. 512 bytes) all is good. You shouldn't have any lost
sectors. The problem is that reading from a drive 512 bytes at a
time is REALLY SLOW. A larger block size is preferred for
performance reasons. But DD always skips an entire block based on
DD's (not the device) block size. If a DD's block size is larger
than the device sector size then some usable data may be lost.
The default block size for GNU DD is 512 bytes. The obsolete FAU-DD
(still available with Helix) uses a default block size of 4096
bytes. The current supported FAU-DD (available from
http://www.gmgsystemsinc.com/fau/) uses a default block size of 1
MiB, and a block size of 5 MiB or more is recommended for static
(not "live") acquisitions.
Perhaps a related problem is that different operating systems read
from drives in different multiples of the device sector size.
Microsoft Windows reads from disk drives in cluster sized units (= 4
x 512 = 4096). So there is the potential that some data may be lost
here, as well as at the application level. Different *nix systems
may use different algorithms. You really have to test the specific
*nix distribution that you are using.
In my experience MS Windows correctly handles "bad blocks" on disk
devices notwithstanding its use of cluster-sized read units. Of
course that could change with the next release of MS Windows. It
also might not be true with different storage architectures (e.g.
flash drives) or devices that use a non-MS device driver. For this
reason we need to constantly test and re-test.
The current released version of FAU-DD (available from
http://www.gmgsystemsinc.com/fau/) uses a slightly different
algorithm from GNU-DD in that it able to use a relatively large
default block size (1-5 MiB or more) for performance, but will drop
down to the device sector size (usually 512 bytes) when it encounters
a "bad block." Then the larger block size is resumed once the "bad
block" has been passed. Now you no longer need to choose between
performance and reliability when using DD to image a drive. The
current released version of FAU-DD is available exclusively from GMG
Systems, Inc.
One final problem is that the data read from a failing drive actually
may change from one acquisition to another. If you encounter a "bad
block" that means that the error rate has overwhelmed the error
correction algorithm in use by the drive. A disk drive is not a
paper document. If a drive actually yields different data each time
it is read is that an acquisition "error." Or have you accurately
acquired the contents of the drive at that particular moment in
time. Perhaps you have as many originals as acquired "images."
Maybe it is a question of semantics, but it is a semantic that goes
to the heart of DIGITAL forensics.
Remember that hashes do not guarantee that an "image" is accurate.
They prove that it has not changed since it was acquired.
Regards,
ReC.