The Background
We recently received an inquiry about an HP RAID 5 recovery that seemed like it would be nearly impossible to recover based on the initial details we received as follow:
- The original RAID was 5 x 300 GB SAS hard drives in a RAID 5 with 1 hot spare
- 3 drives reportedly failed
- The 3 failed drives were replaced with 3 new drives
- The RAID was rebuilt with the original configuration
- Backup data was restored to the new RAID array with 3 of the 6 original drives being used
- It was quickly discovered that the VM files with the most critical MSSQL files were not in the backup after it was restored onto the new array
We figured that, between the RAID rebuild with three blank drives and the restore of the backup, most of the data, if not all of it, would have been completely destroyed; but that didn’t stop us from giving it a try.
The Recovery Process
After cloning all 6 original drives, we discovered that, of the 3 damaged drives, the HP technician who was sent on site to replace the failed drives, only swapped 1 correct drive. He then took 2 good drives with him and left behind the 2 drives with bad sectors. This would likely lead to another crash in the near future.
We then determined which of the 2 original 6 drives were virtually the same. One was the hot spare that kicked in and rebuilt the drive.
Then, in order to rebuild the RAID, we needed the 3 replaced drives and 2 of the partially overwritten originals. It is from here that we expected a huge fight trying to find and recover the VMDK file which held the required MSSQL. Fortunately, we got a bit of a break, as our tools very quickly were able to rebuild the original NTFS file tree.
Once we had the file tree, we were able to find the VMDK and recover the files from within. But, things weren’t done yet. The recovered MSSQL file was still affected by partial overwriting and missing data blocks. We were able to grab a backup file that was also stored within the VM and our client was able to rebuild and repair the file to a point where they could get their client back online.
The Lesson
Much of this crisis could have been avoided and/or the damage reduced had a few things been done differently.
1. When a RAID fails with multiple drive failure, never replace the damaged drives only. Instead, replace all the drives and set the originals aside.
2. Whether a complex RAID or just a single hard drive that has failed, always verify the that the backups are good before reusing an original drive.
3. When a RAID reports a failed drive, the report is based on its port number, not drive number. That is, a RAID starts counting drives from “0” instead of “1”. Therefore, when the RAID reports that drive “4” fails, it is actually the 5th drive that should be replaced.
4. Always test and verify backups on a regular basis.
The Summary
The data was recovered in less than 24 hours for less than $5000 CAD; although, it did take another day or two of back and forth communication with our client to complete the process of getting the MSSQL file stitched back together.
While it was unfortunate that the client had to go through the process of having their data professionally recovered, it was fortunate that they did. Had their original backup been done correctly and the restore contained all the files they needed, they would have still been working on a system with 2 failed drives which would have eventually crashed again, possibly with more severe consequences.
If you are in need of RAID data recovery services, contact the team at Recovery Force and find out how we can recover your data quickly and affordably, without high pressure sales.
Phone toll free: 866-750-3169
To start a RAID data recovery project with us, you can start by creating a ticket on our data recovery support site.