Unfortunately no one can be told what fun_plug is - you have to see it for yourself.
You are not logged in.
Folks,
My 323 is running firmware version 1.02 with two Samsung 320Gb drives in RAID 1 configuration.
I recently had a drive failure. I replaced the drive and the DNS323 detected the new drive, formatted it and resynced the data. Everything going good, or so I thought.
I originally had a data directory in the root which contained two sub-directories for documents and templates. After the resync, the data directory is there, but the documents and templates sub-directories are missing. When I checked each drive individually in the 323, neither of them showed the missing sub-directories. Both drives are still showing the correct amount of used space - about 26Gb.
I tried to create the templates sub-directory and got an error back saying that I can't create a directory as one of the same name already exists.
I have installed Fonz's fun plug and telnetted into the 323. When I cd to the data directory and run ls, I get a listing back showing the two 'missing' sub-directories. When I try to cd into either of them I get an input/output error. I tried chmod to 777 on both but it came back with an error that the directory doesn't exist.
I've read through other posts in this forum and it appears that I'm not the only one who has this sort of problem. Can anyone shed any light on how to get my data back? I do have some of it backed up but there is still quite a lot that I would really like to get back.
At present I have removed one of the drives to ensure that I don't end up any worse off regarding my data. As such, since the DNS is running with just one drive in it, can I run fsck while I am telnetted into the drive?
Sorry if this seems stupid, but I am new to Linux and still very much finding my feet.
Thanks in advance
Aidan.
Offline
Just an update on my situation and a warning to others.
I put one of the drives into my XP machine (loaded with EXT2 drivers). I ran DiskInternals recovery software. It found most of the files and I recovered them onto another drive. Problem is that most of them are corrupted. OK, so my data has been lost - just gotta love their RAID controller!!!
I would strongly recommend that if you are buying one of these units, that you think twice before configuring as RAID 1 - it would appear that the RAID controller isn't up to scratch.
My wife wants to take a hammer to the unit - needless to say we aren't going to be using it any more!!
Anybody know of a good use for a DNS-323 with several round dents in it?
Best of luck
A.
Offline
The DNS-323 has no raid controller. It is purely software. Did you power off the unit before you swapped the drive?
Offline
As an old friend of mine likes to say at times like this.....
Now tell me - do you believe in backup?
Offline
@ frodo - yeah I did power off. Didn't realise that it was purely software. In that case the software isn't up to scratch.
@ fordem - I think you should change you name to Captain Backup - that seems to be your response to just about every post that I have looked at in this forum. Most people do know the benefits of backing up. Just not everybody does backups. In this case, you should expect that a unit that sells itself as a RAID should operate properly - it doesn't, it screws up your data - I know, you're just gonna say - "Should have backed up" - Yep, but didn't and now am paying the price and hoping that my experience will warn others of this unit.
Hope you guys have a happy error free Christmas.
A.
Offline
Aidan wrote:
@ frodo - yeah I did power off. Didn't realise that it was purely software. In that case the software isn't up to scratch.
@ fordem - I think you should change you name to Captain Backup - that seems to be your response to just about every post that I have looked at in this forum. Most people do know the benefits of backing up. Just not everybody does backups. In this case, you should expect that a unit that sells itself as a RAID should operate properly - it doesn't, it screws up your data - I know, you're just gonna say - "Should have backed up" - Yep, but didn't and now am paying the price and hoping that my experience will warn others of this unit.
Hope you guys have a happy error free Christmas.
A.
Actually - as long as your data remained available up to the point that you inserted the replacement drive the RAID array fullfilled it's intended purpose - which is to reduce the impact of downtime caused by a failed drive.
SO.....
In a nut shell, your RAID array did what it was designed to do, and if you had done what you were supposed to do, you would still have your data.
And in response to that Captain Backup remark - I'm probably one of the reasons the people who don't back up know that they should have - so like you I'm hoping my experience will warn others - so that they know, even if they have RAID that they still need to back up - because that's not what RAID is for.
Offline
fordem wrote:
Actually - as long as your data remained available up to the point that you inserted the replacement drive the RAID array fullfilled it's intended purpose - which is to reduce the impact of downtime caused by a failed drive.
SO.....
In a nut shell, your RAID array did what it was designed to do, and if you had done what you were supposed to do, you would still have your data.
And in response to that Captain Backup remark - I'm probably one of the reasons the people who don't back up know that they should have - so like you I'm hoping my experience will warn others - so that they know, even if they have RAID that they still need to back up - because that's not what RAID is for.
Okay fordem, I'll bite: what are you supposed to do once a drive has failed?
I'm a cautious engineer. I purchased this box, and I've been running tests on it to determine what the correct use cases and what I can expect to happen are. Using firmware 1.03 I've run into a very similar problem to what Aiden mentioned above, in that given my two disk array, when I remove one disk, make some changes, and then add the disk back in, the changes aren't synced correctly.
So what's the correct use case? Does failure necessitate a full backup to another disk, manual reformat+rebuild, and then a copy back?
Offline
Gambit - if you insert a new or clean (no partitions or data of any sort) drive, I believe you'll find that the unit syncs correctly - at least in my tests, it did.
What are you supposed to do once a drive has failed? That, sir, is entirely your choice, and has nothing to do with the fact that the unit supports RAID.
You and I are having this discussion for no other reason that I keep on preaching that RAID1 is not a substitute for backup and that even with a RAID1 array, the user still needs to backup their data.
The purpose of RAID1 is to reduce or eliminate the impact of the downtime caused by a disk failure
Let's look at a scenario where a user has a small business and has a single hard disk.
Being a conscientious user he backs up at the end of every business day - one day at 9:00 am whilst opening the day, he gets a hard drive failure message - the system doesn't boot, he can't work, so he calls his tech support people, they come in, replace the failed disk, reinstall the OS and restore the backup - there is no loss of data - but - there is a loss of revenue, which can be quantified as n $/hour x t hours - where n represents the hourly revenue and t the number of hours it takes tech support to get him back up and running.
Let's now change the scenario - the same small business owner, except, he's learned from his mistake, now he's running RAID1.
He is still a conscientious user and so he still backs up at the end of every business day - as before - one day at 9:00 am whilst opening the day, he gets a hard drive failure message - this time the system boots, he can still work - he calls his tech support people, they come in at 6:00pm - 6:00 pm to allow him to perform his usual end of day routines, including the daily backup - and replace the failed disk - in an ideal situation it would rebuild automatically, and everyone would be happy. Let's make this a "non-ideal" situation - it's not rebuilding automatically and tech support have no option but to reformat, reinstall the OS and restore the backup - there is no loss of data - there is no loss of revenue. The downtime required has been shifted so that it occured outside of business hours.
Did the RAID array do it's job?? Ask the business owner - how much did the downtime cost him?
Just for the heck of it now - let's take a different small business owner - he has RAID1 - he doesn't bother to backup - everything works well - so well in fact that he hires an assistant, who fresh out of computer school has been told that you must format a disk before you can use it - so he/she dutifully follows the instructions and formats the disk - the business man notices what he/she's doing just as he/she hits the enter key.
What now ????
The data on that RAID1 array is toast - it's actually still there - just the boot sector on the drive has been overwritten, so how do you get it back? Call a recovery specialist, use a recovery tool, any number of ways - but - none of them is as sure as recovering from the backup would have been, if there was one.
Did the RAID array do it's job? Ask the business owner - how much did the downtime cost him? Is he still in business? Chances are, if it was an online business - he's not - not unless he gets that data back.
How you recover from a disk failure with a RAID array is irrelevant to the discussion - some systems - typically the more expensive, hardware RAID with hotswap disks - will automatically rebuild when the failed disk is replaced, others, where the RAID is handled by the OS may require significantly more work - the whole point is that the corrective action - whatever it is - can be scheduled for the most convenient time.
Offline
fordem -
Thanks for the details. I think there is a lot of misunderstandings and erroneous expectations going around about what a RAID1 configuration is useful for. Your points about user error (or, my personal fav example, viruses attacking available mounts) should be taken to heart by everyone.
My question is a little more specific. I was under the impression that making a change on the drive, and then later adding an out-of-date member of the same RAID1 cluster should cause it to re-sync. Is that an erroneous expectation on my part? Or is that a valid use case that the current 1.03 firmware doesn't support (though I'm seeing some hints that 1.04 will)?
Offline
Field support techs are trained NOT to remove disks from functional RAID arrays except under specific circumstances and even then the norm would be to use the RAID management utilities to force the drive offline or "fail" it to reduce the possibility of data loss.
Why would you think - or - where did you get the impression - that you could remove a drive, make a change on the remaining drive and then reintroduce the missing drive and get the system to resync? Why would you expect this to be a normal event?
Offline
Mostly because I was looking at it as a change event. RAID1 sees that the drives are not "in sync" and forces the old one to resync to the master. I guess that's an erroneous way of looking at it
Offline
Based on my experience with business grade equipment (and I hasten to point out - they usually have hardware RAID controllers - I have precious little software RAID experience) - I'd say it's not the norm - I would expect to have to go into the RAID management utilities and select the drive and initiate the rebuild.
Offline
When I first purchased my DNS-323 , I tested the same scenario on a RAID1
DNS-323 as Gambit:
remove a drive
change some data
replace drive to re-sync
and, sadly , I had the same results.
I was quite disappointed with the D-Link firmware and if is wasn't for Fonz's
fun_plug, I would have returned my DNS-323 back then!
The only way I was able to successfully re-sync the removed drive was to use telnet access
and mdadm commands from the command prompt to fail and remove the "bad" drive.
I plan to use this technique if a drive ever fails on my DNS-323. I will not reboot my DNS-323
or allow D-Link's scripts to attempt to rebuild the array.
1) #cat /proc/mdstat (to see which drive /dev/sda2 or /dev/sdb2 has failed) 2) #mdadm /dev/md0 --fail /dev/sdb2 --remove /dev/sdb2 (if /dev/sdb2 fails) 3) remove failed drive 4) data changes as I wait for new drive replacement, daily backups continue as usual 5) add the new drive, make sure it is added as /dev/sdb (I once saw it was added as /dev/sdc??!) 6) use fdisk to examine partition on /dev/sda [working disk] and create identical partitions on /dev/sdb [new disk] 7) #mdadm /dev/md0 --add /dev/sdb2 (add new disk to array) 8) #cat /proc/mdstat (to see progress of the re-sync)
Last edited by mig (2008-01-08 07:23:31)
Offline
mig - We're obviously creeping towards a "that's not a Dlink box, that's /my/ box" scenario. While I don't, for one, particularly mind that, it would be nice to have a conclusive mechanism to disable all of Dlink's automatic scripts that /didn't/ also run the risk of bricking the box. I haven't tracked what's update-able on the box (can you write to the root partition?), but fighting the OS seems... unpleasant. If I was in your hypothetical position, I would probably backup, format-and-reinitialize, and then copy over. Yeah, it's a pita, but imagine how much it would suck to do those steps manually and then get the horrible "you must format" prompt from the webui?
fordem - my experience is somewhat similar. Thinking more on it, what really threw me off was that the drive wasn't recorded as failed or anything, AND that data corruption in the form of folders inaccessible through the RAID partition occurred. Now, they might have been available if I'd tried mounting the /dev/sda2 partitions explicitly, but having the RAID go into that untenable state was... disconcerting.
Offline
gambit - You're right, I am SO disappointed with D-Link's firmware, I try not to rely on their
programming as much a possible. I see it as fighting with D-Link's scripts rather than fighting
with the OS (Linux kernel / utilities).
IMHO, the DNS-323 hardware is very impressive... small, quiet, low-power, dual SATAII,
fast and relatively cheap. I think the concept of restoring all the configuration files from
flash on boot-up is brilliant! But the lack of a journaling file system, old version of samba,
no UPS support, miserable clock drift makes this device a mediocre NAS.
I've been commenting on these issues at the "official" D-Link forum, but none of the moderators
seem to want to post comments about them. This coupled with the fact that the last firmware
update has been over 9 months ago, make me feel like D-Link doesn't plan to make many
improvements to the firmware.
If it wasn't for the fun_plug exploit and Fonz's software development, I would have given up
on this device long ago. I would love to see a native Debian (or other distribution) kernel
loader. There was some talk about this at http://dns323.kood.org/forum/t347-Loader-DNS-323.html
but no current progress has been reported. I afraid kernel cross compilation is a bit beyond
my current Linux skills, but I always learn a lot when playing with these NAS devices (that is
until I actually use the device to provide a function in my network, then I have to stop tinkering
with is and just let it be)
About the root partition, I believe this exists in the RAM, so writing root is a bit space limited.
I have created several links (recreated from startup scripts each boot) from the root that point
to directories on the RAID to keep config files and users home directories. This seems to work well.
Offline
Gambit - I agree with you in that I feel if a drive disappears from the configuration, this action should be recorded as a failure, and reintroducing the drive at a later date/time should not restore the system to apparent normalcy - I have seen this happen during what I will call my RAID testing phase.
It should be possible and in fact quite a simple task to write some sort of timestamped configuration file to the disks and look at them on startup to determine what's going on, and I believe is how DLink presently determines that a drive has failed to send email alerts - which did not work prior to 1.03.
mig - Based on what I see at the DLink forum, 1.04 will probably be officially released, but I suspect it will be the final release - I've discovered that the mods will delete your posts simply because you express an opinion challenging theirs - so it's not a forum for open discussion.
Offline
fordem wrote:
I've discovered that the mods will delete your posts simply because you express an opinion challenging theirs - so it's not a forum for open discussion.
Yes, I saw your post disappear on the D-Link forum, too. I though my post mentioning a
competing product to the DNS-323 in the "Defrag with Firmware 1.03" thread was susceptible
to removal. Either the moderator didn't read it or let it slip by.
By the way, that device is a Buffalo Tech Linkstation Pro Duo, which could prove to be a
good replacement for my DNS-323, since is has several features I'm looking for... XFS filesystem,
and UPS support. Although, perterjb seems to be making progress with NUT for UPS support on
the DNS-323
Offline
Gambit wrote:
Okay fordem, I'll bite: what are you supposed to do once a drive has failed?
My answer would be: thank my lucky stars that RAID allows me to deal with the failure at my convenience rather than at the moment it happens; schedule some downtime, take the device offline, install a new, blank drive, initiate a re-sync, and then put the device back online.
Possibly, the device can be put back on line while it is re-syncing, so this might be a 15 minute outage rather than waiting for the hours that it takes for a re-sync.
Offline
Thanks for the interesting discussion guys!
I stumbled on this while searching for the keyword "Input/output Error" when using "ls". The general "net conclusion" is drive failure.
In my case I have a similar problem to another poster on this forum. I have a mysterious "loss of data" in one of my directories which comes out with an Input/Output Error when doing a listing. I am also running RAID1. I've setup an rsync backup system of my RAID1 DNS to do remote backups to an offsite storage location. In my debug stage I decided to just backup (using rsync) one test directory to another backup directory on the same RAID1 array.
When I do this, the backup goes well and things look really good. I have hourly backups showing up and changes I make are reflected in my most recent snapshot and older snapshots have hardlinks correctly made. This morning I made an update to my fun_plug and then rebooted the DNS. I checked my "backup" directory and all my snapshots were gone! Totally gone! I did an ls through telnet and it showed me the "input/output" error. Very odd. How does backed up snapshot data just "go missing".
My only guess is that perhaps the DNS had not finished mirroring the data to both drives when i performed the reboot? Is the D-Link software that dumb to not even clear out any syncing jobs before rebooting through the web interface?
The story gets more interesting...
I went out for about 2 hours and came home. I checked the backup directory and "magically" all my snapshots are back! Was the DNS resyncing? and if so it looked like it took a while! Now I'm not sure if I should trust what I see but I certainly don't feel too keen on the DNS RAID implementation. I've used hardware raid at work and the syncing of drives when writing a single 5MB directory is instantaneous. I can't believe it would take more than minutes to do the same on the DNS. Certainly I am still perplexed on what happened here but it seems like when I write to y RAID1 I should wait a while before rebooting it!
Thought I'd add that in since my "missing directories" ended up showing up much later in time so patience might be a solution. I don't like it though.
Offline
Let's start by clearing up some misconceptions - sync and mirror are two totally different things. There seems to be some degree of confusion in this regard where linux is concerned. Admittedly I haven't come acoss it here, but in a number of other places, I've seen what I will call scheduled synchronizations (a scheme in which the drives or folders are synchronized every x hours using rsync) referred to as mirroring - it's not.
The terms "mirroring" and "RAID1" imply real time or simultaneous, duplicated writes - one to each drive.
The DNS-323 in RAID1 mode does real time, simultaneous, duplicated writes - this is actually visible from the front panel, you will see both drive LEDs flashing as you write to the RAID array - and the data should be on both drives once the write operation is completed, and can be read from either drive - I say should because those are my expectations, and in so far as I have tested it, those were my experiences.
Resyncing in the context of a RAID array - and it's not the term I would choose (I prefer rebuild) - should only be necessary when a member of the array has been replaced, although I have seen it occur occasionally on arrays that have been improperly shutdown (not on a DNS-323)
Why did your data disappear and then magically reappear - and where was it in between - I have no idea.
Offline
Thanks for the clarification. My initial assumption of RAID1 (mirror) was that of simultaneous write but the DNS status page shows a "Drives last sync'd" status message so I was wondering if they had some kind of caching going on where the software writes to drive A and then eventually writes to drive B making it one of the worst RAID1 implementations I'd ever seen.
I have noticed that using rsync may have risks on SAMBA mounted windows shares from this site
http://www.mikerubel.org/computers/rsync_snapshots/
"One report came from a user who mounts a windows share via Samba, much as I do, and had files mysteriously being deleted from the backup even when they weren't deleted from the source. Tim Burt also used this technique, and was seeing files copied even when they hadn't changed. He determined that the problem was modification time precision; adding --modify-window=10 caused rsync to behave correctly in both cases. If you are rsync'ing from a SAMBA share, you must add --modify-window=10 or you may get inconsistent results. Update: --modify-window=1 should be sufficient. Yet another update: the problem appears to still be there. Please let me know if you use this method and files which should not be deleted are deleted."
Once I get my mysterious directories sorted out I might look into this further.
Offline
bozilla -
I make a point of examining the results of "mdadm -D /dev/md0" whenever something wierd is going on. It's much more useful then looking at the webpage (which is totally useless).
The only way I was able to get the Input/Output error to occur was by manually desynchronizing the disks. It doesn't sound like that's what you did, but mdadm might tell you more about the current state of the cluster.
Offline
Gambit wrote:
fordem -
Thanks for the details. I think there is a lot of misunderstandings and erroneous expectations going around about what a RAID1 configuration is useful for. Your points about user error (or, my personal fav example, viruses attacking available mounts) should be taken to heart by everyone.
My question is a little more specific. I was under the impression that making a change on the drive, and then later adding an out-of-date member of the same RAID1 cluster should cause it to re-sync. Is that an erroneous expectation on my part? Or is that a valid use case that the current 1.03 firmware doesn't support (though I'm seeing some hints that 1.04 will)?
Yes, its an error. None of the RAID levels support this type of synchronization. Its not a valid use-case for RAID at all.
To do that would require some sort of journal to be maintained or a full comparison of the drives file system to be constantly carried out.
In all cases of drive failure, RAID systems assume a fresh, blank drive is inserted/made available. There is never synchronisation from an "outdated" version of a disk from a drive set. That is outside the scope of what RAID provides.
This is the case for all of the RAID configurations (RAID1, RAID5, etc).
Another problem is that the device is using EXT2, which does not use journaling. This means if the power goes off during a file write, the disk system will be in a bad state, and a tool like e2fsck needs to be run. Normally Linux will do this automatically, but who knows if Dlink have configured this. Hopefully it can be run from the Telnet.
The complication with RAID1 in this power failure situation is that the two disks may be slightly out of sync when the power goes. So that's a bad situation to be in. Read the Wikipedia article, concentrate on the topic on Atomicity of RAID arrays here: http://en.wikipedia.org/wiki/RAID
Basically what it says is that for a device like the 323 with software RAID, small CPU and likelihood of non-atomic writes, the two disks in a RAID1 array will probably be slightly out of sync with an uncompleted write when the power goes, and that is not great. ie both disks will be partly through writing a file, and may be at different points in the write. eg disk1 is 50% finished and disk2 is 51% finished.
Most likely it will be cleaned up by e2fsck, but its worth thinking about that maybe it won't be. Even if you have EXT3 you will still have this problem, as RAID1 assumes that the disks are perfectly mirrored.
In this scenario, they are not.
Therefore its essential to use a UPS to minimise this possibility.
Offline
Welcome aboard markchicobaby - it's nice to not be the only voice crying in the wilderness
Offline
fordem wrote:
Welcome aboard markchicobaby - it's nice to not be the only voice crying in the wilderness
You both are on the subject of "what RAID is" not alone. I read almost everything, and I am often amazed on various user posts.
Offline