A few weeks ago one of my many 300GB hard drives crapped out. The drive didn’t have any important data on it but I still attempted a recovery for the sake of exercise. Well it proved to be difficult and largely lacking in actually recovering the data I wanted. This event made me start thinking ‘what if’ with the 3TBs of other data I have on the same system. So I started doing research on building a NAS. I had planned on using my old hardware as a prototype, so I pulled out my old original tower. I figured using older hardware would fault under the testing I was going to be performing.
I originally wanted to go with a Linux distro because I am familiar with Linux. I couldn’t seem to find a Linux distro that was developed, free, with software RAID support built-in. I did however stumble onto a NAS oriented FreeBSD variant called FreeNAS. A very impressive OS with lots of features, really stable, and has a handy web interface.
Once I had FreeNAS installed it was time to start transferring all the data over. Over GBe I was getting transfer rates that peaked at 79.1MB/s but averaged 50-65MB/s. This was such an improvement over the 100MBit/s connection before. The process of transferring all my data still took several hours. Afterward I wanted to confirm all the data had been transferred correctly. So I started searching for a way to run MD5 checksums on all the files, recursively. Well the md5 binary on both OSX nor FreeBSD do not natively support going through each folder and checksuming each file and outputting the sums. Sure there are one line shell scripts/commands you can run but I was really looking for a way that would not require hacking about. I finally found what I was looking for in md5deep. The md5deep suite can natively recurse through the directories and checksum each file in MD5, SHA1, SHA256, Tiger, or Whirlpool hashes. I did have to compile it from source on OSX which is a piece of cake if you have Xcode installed. On FreeBSD you can install the md5deep binary by issuing the command: “/usr/sbin/pkg_add -r md5deep” as root.
The command I ran on the original files:
md5deep -r -e -l ./FOLDER_TO_CHECK/ > ./checksums_FOLDER_TO_CHECK_Original.md5
The command I ran on the transferred files:
md5deep -r -e -l ./FOLDER_TO_CHECK/ > ./checksums_FOLDER_TO_CHECK_R5Array.md5
You may want to omit the -e flag as there will be a slight performance increase on slower systems. Also if you want to check the last five files that have been hashed in a separate terminal window you can issue the command: “tail -n 5 checksums_FOLDER_TO_CHECK_Original.md5”. I then used diff to compare the checksum files using this:
diff ./checksums_FOLDER_TO_CHECK_Original.md5 ./checksums_FOLDER_TO_CHECK_R5Array.md5
diff will report back only changes or in this case different hashes. If there are any discrepancy, make sure you didn’t edit the file between running the hashes, if not retransmit the file and rehash.
As all the files came from a OSX machine on a HFS partition they have “.DS_Store” files in practically every directory Finder went through. So I suggest running:
find . -name '*.DS_Store' -type f -delete
Danilo Quddus Stern-Sapad at Ariadoss.com for his method of recursively remove all DS_Store files.
Michael Simons at michael-simons.eu for his alternative method of recursively checking md5 hashes.