DSM-G600, DNS-3xx and NSA-220 Hack Forum

wrlee · 2008-10-30 04:41:57

I understand the basic concept that a dual-disk RAID1 configuration replicates data between the two disks so that, should 1 disk fail, the drive set will continue to operate. However, after removing a drive from a configuring a RAID 1 configuration, the web console prompted to reformat the remaining drive (rather than continue to work with the remaining drive). I would have expected the 323 to behave as if nothing had happened. Can someone describe what was happening?

I also found that one of the files is yielding an I/O error (while reading a long way into a particular large file). I might have expected that if an I/O error were incurred in a file, the RAID s/w would pull error free sectors from the other drive (I am assuming that, between the two drives, there is intact data). Is this not how RAID 1 works?

Finally, I want to test each of the drives (potentially destructively), separately, on a different machine. I am thinking that I should be able to pull one of the drives for testing, wiping the initial sectors, and reinstalling in the 323 whereupon it will rebuild the drive. After the drive synced through a rebuild, I would remove the other drive for testing, wiping and reinstalling whereupon I would expect it to be synched via another rebuild. Does this sould like it work?

Thanks for the help on my novice RAID questions.

Bill...

luusac · 2008-10-30 14:17:36

which drive did you pull lhs or rhs?
No, I don't think that is how raid1 works, maybe how raid5 works. The raid on the 323 is not that robust (although I do use it myself) when it comes to this kind of thing; I don't think that it was designed with swapping in mind, so if you use it make sure you have backups. There are lots of posts here around your questions. Use keywords like raid1, degraded, swapping drives, etc etc As "fordem" a frequent poster here will tell you "RAID1 is for disk redundancy, not data backup, don't confuse the two ..." if you want drive swapping, but don't want to get your hands dirty maintaining things, or don't want the hassle of things going wrong there are many posts here and pages on the wiki on automated/scripted solutions to copy from one disk to another - as far as the 323 knows you have two independant disks.

wrlee · 2008-10-30 18:12:00

luusac wrote:
which drive did you pull lhs or rhs?

I had pulled the LHS (volume 2). I am using it for redundancy/reliability, using backup software to back up to it. I have seen the posts talking about automatic copying of data from one volume to another... maybe now I'm beginning to see why you'd do that.

Thanks,
Bill...

blahsome · 2008-10-30 19:32:10

What wrlee described sounded reasonable to me. He's not swapping drives if I understand correctly. He would be inserting a "blank" HD into a RAID1 setup, and that's exactly what you are supposed to do if a drive failed. The DNS-323 should support that.

fordem · 2008-10-30 19:48:32

RAID1 or RAID5 (in fact, any RAID implementation other than RAID0) should appear exactly the same to an end user - if a single drive fails or is pulled from the system to simulate a failure (assuming it is safe to do so from a hardware standpoint), the end result should be the same, that data should remain available and there should be some sort of alarm or other error indication that a failure had occurred.

By way of explanation of the - assuming it is safe to do so from a hardware standpoint - certain physical drive interfaces were not designed for hot plugging and any attempt to pull a drive to simulate a failure could well result in catastrophic failure of the hardware - these include the old ST506/412 & ESDI interfaces, IDE/ATA also known as parallel ATA and the 50 & 80 pin SCSI interfaces - these all use ribbon cables with relatively high numbers of connections. On the other hand, SCSI interfaces using the SCA & SCAII connectors, SAS (Serial Attached SCSI) and SATA )Serial ATA) were all designed with hot plugging in mind and it is safe (again from a hardware stand point) to remove a drive with power applied.

In most of the RAID1 (and RAID5) environments I have worked with it is possible to "yank" a drive without bringing the system down, and to wipe all data & partitions from that drive and re-insert it and allow it to rebuild - please note the rebuild may or may not start automatically - it depends on the RAID controller - and in my initial tests with the DNS-323 this is exactly what I did, and it worked flawlessly.

For those of you who may have noted that I earlier said "most of the RAID1 (and RAID5) environments - I have discovered that parallel ATA disks do not adapt well to RAID environments - especially if you build a RAID1 array using a master/slave pair on a single cable - often times if the slave fails, the system will freeze momentarily (whilst the controller - which is actually on the "master drive" figures out that the slave has died) and if the master fails, the system will more often than not, crash (the disk controller on the master died). I've also seen what are known as "punch through" errors, where errors on one drive can cause errors on another drive - again because of the shared controller.

Now - although the DNS-323 uses SATA disks which support hot plugging, the unit itself does not support hot swap and as a result it will not automatically rebuild a failed disk - in my simulations I inserted the newly cleaned disk and after powering up and logging into the admin interface, the unit prompted me to format the drive, which I did, after which it resynched successfully.

I have seen horror stories from folks who say they lost all their data when the unit formatted the wrong disk, but the only way I was able to provoke this is to use a disk with data on it as my "replacement disk" - please note - this was a deliberate attempt to simulate a data loss scenario, and not something I would normally do, having learned, many years ago that it was a sure fire way to lose data.

It was my original intent when the DNS-323 was released, to use it as a RAID1 box in a number of very small businesses, however, I have not gone that route as the RAID1 implementation does not appear to have the reliability my clients would need.

I agree with blahsome - what wrlee tried to do should have worked - and when I tried it, it did work.

jesbo · 2008-10-30 19:55:26

blahsome wrote:
What wrlee described sounded reasonable to me. He's not swapping drives if I understand correctly. He would be inserting a "blank" HD into a RAID1 setup, and that's exactly what you are supposed to do if a drive failed. The DNS-323 should support that.

Exactly.

When both drives are installed and in sync, file reads will alternate between both drives (ie some blocks of the file will come from the lhd, some from the rhd)... You can see this if you copy a large file from the DNS to a local PC. At least mine works this way. When writing files, data is written to both disks.

I would expect to be able to simulate a drive failure by pulling it out, deleting partitions or erasing it on another machine, and reinserting it - at which time it should be rebuilt and brought in sync with the still-good drive. This does not have to be automatic (as in RAID 5), but the unit should still preserve all data on the good disk and provide the facility to re-mirror to the new drive and reestablish the RAID-1 mirror.

If it does not work this way, then I am just kidding myself thinking the RAID-1 is useful at all for redundancy. I might as well go JBOD.

I will say, though... I do back up the DNS to 8mm Tape weekly as a true backup.

Last edited by jesbo (2008-10-30 20:01:21)

wrlee · 2008-10-30 20:12:56

Thank you, fordem for the explanation. I am not sure what had happened; I wasn't _hot_ swapping the drives (I didn't know it was safe to do that!)... I had shutdown and removed the other drive. It could have been a glitch or maybe I didn't realize that the drive was still rebuilding (although it was the secondary drive I'd removed, not the primary).

Can you elaborate (or refer me to info) on the issues that make the 323 _not suitably reliable_ as a RAID 1 implementation?

Is RAID1 supposed to be able to circumvent I/O errors in a sector of one drive (by reading corresponding sectors of the other)?

Bill...

jesbo · 2008-10-30 20:16:11

In a proper implementation of RAID-1, if an unrecoverable I/O error occurs on one disk, the desired block(s) of the file should be read from the mirror disk and a hardware error should be logged.

fordem · 2008-10-30 21:15:28

jesbo wrote:
I would expect to be able to simulate a drive failure by pulling it out, deleting partitions or erasing it on another machine, and reinserting it - at which time it should be rebuilt and brought in sync with the still-good drive. This does not have to be automatic (as in RAID 5), but the unit should still preserve all data on the good disk and provide the facility to re-mirror to the new drive and reestablish the RAID-1 mirror.

Even in RAID5 a rebuild is not automatic - I have had installations that were RAID5 on LSI logic MegaRAID controllers where you manually had to select the replacement drive and start a rebuild.

For a RAID array to be deemed functional, all that is required is that the data continue to be available in the event of a failed drive, how the array is restored to an optimal condition is left up to the manufacturer's discretion.

Automatic rebuilding, remirroring or resynching - like hot swap - are highly desirable features but are not mandatory.

bq041 · 2008-10-30 21:24:19

The problem with RAID1 on the DNS, is D-Link's implementation of it. It is a software RAID (as well is the RAID0 and JBOD) using mdadm as the setup. If you telnet into the device and use mdadm to build, break, add to, remove from, or simulate failures of the array, it works fine. The flaws stem from the DLINK firmware not formatting and setting up the disk(s) correctly. Their error checking of the array is also flawed. If you want to learn more, google mdadm and read up. I do not use the web setup anymore for any disk related activities. I have repeatedly broken and rebuilt arrays on the DNS easily 50 times without any data loss. The key to making the dlink web interface happy with this is proper setting of a few files that DLink looks at (raidtab, raidtab2web, hd_magic_num).

jesbo · 2008-10-30 21:38:29

bq041 wrote:
Their error checking of the array is also flawed.... I do not use the web setup anymore for any disk related activities. I have repeatedly broken and rebuilt arrays on the DNS easily 50 times without any data loss. The key to making the dlink web interface happy with this is proper setting of a few files that DLink looks at (raidtab, raidtab2web, hd_magic_num).

Thanks! I initially created my RAID-1 using the web interface (when the unit was new and I was a DNS-323 noobie). Since then, I've implemented ffp and now have full access and can use mdadm from this point forward. Are there any tweaks to the files you mention that one should make? Pointers / explanations welcome.

bq041 · 2008-10-31 00:17:44

Search the forum for upgrade of f/w 1.03 to 1.05. I wrote some scripts that change the partitions to the newer style and it has to set the files in it. Anyway, download the script and open it an editor. Browse through it to see what I did. I tried to remark it, so you can tell what is going on at any point. Also on the forum are some scripts I wrote for creating and breaking arrays. These also include sections on making or updating these files.

Also google "man mdadm" and read it. It will show you how it works. Between the 2 things above, you should start to get a good idea of what is going on. If you have specific questions, just ask, but goofing around for yourself is the best way to learn.

Last edited by bq041 (2008-10-31 00:19:14)

DSM-G600, DNS-3xx and NSA-220 Hack Forum

Announcement

#1 2008-10-30 04:41:57

Manipulating RAID1 (how does it work)?

#2 2008-10-30 14:17:36

Re: Manipulating RAID1 (how does it work)?

#3 2008-10-30 18:12:00

Re: Manipulating RAID1 (how does it work)?

luusac wrote:

#4 2008-10-30 19:32:10

Re: Manipulating RAID1 (how does it work)?

#5 2008-10-30 19:48:32

Re: Manipulating RAID1 (how does it work)?

#6 2008-10-30 19:55:26

Re: Manipulating RAID1 (how does it work)?

blahsome wrote:

#7 2008-10-30 20:12:56

Re: Manipulating RAID1 (how does it work)?

#8 2008-10-30 20:16:11

Re: Manipulating RAID1 (how does it work)?

#9 2008-10-30 21:15:28

Re: Manipulating RAID1 (how does it work)?

jesbo wrote:

#10 2008-10-30 21:24:19

Re: Manipulating RAID1 (how does it work)?

#11 2008-10-30 21:38:29

Re: Manipulating RAID1 (how does it work)?

bq041 wrote:

#12 2008-10-31 00:17:44

Re: Manipulating RAID1 (how does it work)?

Board footer