DSM-G600, DNS-3xx and NSA-220 Hack Forum

jayas · 2008-04-19 17:40:54

Dear Fordem

fordem wrote:
Do we possibly have a scenario here where the DNS-323 is detecting the data as not being consistent and flagging it as such?

I think that is letting your imagination run a bit too wild. Do we think it would to this extent when it does not even use S.M.A.R.T? I don't think so.

Jaya

fordem · 2008-04-19 19:41:57

jayas wrote:
Dear Fordem

fordem wrote:
Do we possibly have a scenario here where the DNS-323 is detecting the data as not being consistent and flagging it as such?
I think that is letting your imagination run a bit too wild. Do we think it would to this extent when it does not even use S.M.A.R.T? I don't think so.

Jaya

I'll willingly admit that you may be right about the imagination part - since I have no idea what it does or how it does it - and since mine doesn't do it, it's hard for me to do anything other than theorize - BUT - trust me on this - systems have been doing this for years, my xSeries IBM (Adaptec based RAID) detects the array as out of synch before the OS loads, as do other hardware controllers.

By the way - S.M.A.R.T does not & can not play a part in verifying data integrity at startup- it's simply a threshold based detection system - a number of different variables are monitored - for example the number of remapped sectors and when the count exceeds a predetermined threshold, S.M.A.R.T will flag it.

The consistency detection mechanism needs to be nothing more than a flag being set/reset - set the flag before writing to disk#1, reset the flag upon completion of the duplicate write to disk#2 - a power fail occuring before the completion of the write will leave the flag set, and this can be detected on the next boot up.

Sure it's a theory - but I started my post by making that clear - Feel free to shoot my theories down - but - please, do so constructively - either show me why I'm wrong, or show me a better theory. Better yet - since you are more conversant with linux - run with my theory and either blow it out of the water, or verify that I'm right - right now you're ridiculing my guess based on your guess.

jayas · 2008-04-20 13:39:17

Dear Fordem,

fordem wrote:
By the way - S.M.A.R.T does not & can not play a part in verifying data integrity at startup...

All I meant to say was that if the implementors have yet to hook on to S.M.A.R.T. then it is quite unlikely they would have ventured into techniques of the kind you theorise! Didn't mean to solicit a lecture.

Jaya

Last edited by jayas (2008-04-20 13:39:44)

Colin_M · 2008-04-21 04:40:24

For what it's worth, I've had my DNS-323 about 3 weeks now. I initially started with a single 500Gb Samsung HD and upgraded from 1.03 to 1.04 at the initial installation.

After a week I upgraded to a second Samsubg 500Gb and switched to RAID1.

Though I have had other excitement on my setup (see here), RAID1 appears to have been fine over this period and I get the normal messages on the Status page. Frm time to time, I notice a few more clunks and clicks from the new setup than I did from the single disk - guess RAID1 is keeping the two disks a little busier as it spreads the data.

Colin

Basher · 2008-04-23 06:45:40

Upgraded to 1.04 (Western Digital 500Gb Drives) and had the same issue with the left hand drive going amber and the RAID 1 pack going degraded. Reformatted the drives as part of the upgrade.

Reformatted (full reformat) the "failed" drive outside of the DNS unit and re-inserted - RAID one reformat and the recovery worked fine. A couple of days later the right drive "failed" and pack degraded. Repeated the same remove and format - replaced the right drive in the DNS and it formatted and re-built the RAID 1 pack no probs. Worked fine for a few days and now surprise surprise the left drive is flagged as 'failed'.

Both drives are approx 6months old and have been checked with WD's drive tools and found to be faultless. Both are from different drive batches.
Instability only occurred after upgrade to 1.04. The RAID 1 pack had been stable for 6 months prior to upgrade.

Am going back to 1.03.

Will be waiting for a 1.05. 1.04 looks to be suspect.

jayas · 2008-04-23 09:00:22

Hi Basher,

Basher wrote:
Upgraded to 1.04 ...
[ first left drive had amber light ...]
[ then right drive had amber light ...]
Will be waiting for a 1.05. 1.04 looks to be suspect.

DLINK asked a question as to what kind of drives people who have this problem are using. I wonder how many replied and if DLINK has an idea what the problem is. Otherwise it would not be reasonble to expect this problem to be fixed in 1.05 firmware.

Is there anything you can tell us DLINK?

Jaya

Megistal · 2008-04-23 15:38:27

I've continue testing my DNS-323 to pinpoint exactly when/how the drive fail.

At the moment I've been able to find out that with 1.04 the dns-323 in raid 1 will operate normally after formatting as long as nothing is ever written on it. Can switch it on and off as often as I want and as long as I want, the pink/amber light issue does not occur.

However, as soon as I write anything (as small as 50k) even if I delete it immediately and turn off the drive; soon after the next power up I'll get the light issue.

Last edited by Megistal (2008-04-23 15:41:29)

fordem · 2008-04-23 15:55:07

Megistal - that's an interesting observation - are you running a fun_plug and are you turning it off using the button?

My experiences have been different - I have the first hardware revision, with fw 1.04 off of the US support site and 2x250 GB Seagate Barracuda 7200.9 in a RAID1.

It runs not quite 24/7, but as close to that as I can manage given our less than reliable utility power - I can read/write/delete - whatever I want and the unit is almost never properly shut down, and the only time I've seen that pink/amber light is back when I had JBOD and hot unplugged a drive.

I suspect I've never seen the problem because my DNS-323 will always be the last thing to shut down and so it's never reading/writing when the power goes - it's fed from a PowerWare UPS that shuts a Windows server down before turning off - I did have a script so that the server would do a proper shutdown on the DNS-323, but I reinstalled the server (larger drives) and haven't put the script back yet.

Megistal · 2008-04-24 03:19:00

Hi Fordem

I'm not running anything else beside the bundle functionalities of the dns-323.
I'm using the front square on/off button.

I open it when windows boot
I close it before closing windows

I'm doing test while windows is running.

As for now I've downgraded to firmware 1.03. Much of the issues corrected with 1.04 does not apply with what I need out of my dns-323 so I'll give it a try (firmware 1.03)

Dlink · 2008-04-24 04:13:18

jayas wrote:
Hi Basher,
Basher wrote:
Upgraded to 1.04 ...
[ first left drive had amber light ...]
[ then right drive had amber light ...]
Will be waiting for a 1.05. 1.04 looks to be suspect.
DLINK asked a question as to what kind of drives people who have this problem are using. I wonder how many replied and if DLINK has an idea what the problem is. Otherwise it would not be reasonble to expect this problem to be fixed in 1.05 firmware.

Is there anything you can tell us DLINK?

Jaya

In short,

In 1.03 drive failure detection was not working properly. It was fixed in 1.04. Alot of the issues we have seen with the Raid degraded status have been related to some firmware drivers on some WD disks causing critical errors which cause our device to flag it as a failed drive. Still a work in progress and hoping to fixed before 1.05 goes public.

It is difficult to pin point these types of issues since we have limited resources to hard drives. We collect as many different sizes/makes/models as possible but there will always be a few that we do not have access to for testing.

Megistal · 2008-04-24 15:19:34

Hi DLink

Could you provide which functionality was fixed in 1.04 for the raid 1 setup?

Looking at the release notes from the DLink web site for the DNS-323, I can't see anything related to raid 1.

ftp://ftp.dlink.com/Multimedia/dns323/F … es_104.txt

The last raid 1 fix was made through 1.03 for the e-mail notifications

http://www.dlink.com/products/support.a … 0#firmware

As for the drives, I do not use WD drives but Seagates:
500 gigs Seagate ST3500320AS

if that can help you

jayas · 2008-04-24 16:36:14

Hi DLINK

Dlink wrote:
In 1.03 drive failure detection was not working properly. It was fixed in 1.04. Alot of the issues we have seen with the Raid degraded status have been related to some firmware drivers on some WD disks causing critical errors which cause our device to flag it as a failed drive. Still a work in progress and hoping to fixed before 1.05 goes public.

I am not sure if the problem is restricted to WD disks. At least I have seen it occur on DNS-323 with Seagate drives although I suspected (don't recall my reasons as it was a while ago) that power interruption may have caused it.

However in investigating at why the hot-plugging fails I found issues with scripts. Is there a channel for issues and suggestions to be relayed to the development team?

Jaya

Megistal · 2008-04-24 20:54:21

Hi Jayas

If that can help you, I'm using Seagates drives and the DNS-323 is plugged on my UPS. So all testing done are not prone to power interruption that could go unnoticed and my UPS up to now didn't failed me when I needed it.

Dlink · 2008-04-24 21:16:11

Megistal wrote:
Hi DLink

Could you provide which functionality was fixed in 1.04 for the raid 1 setup?

Looking at the release notes from the DLink web site for the DNS-323, I can't see anything related to raid 1.

ftp://ftp.dlink.com/Multimedia/dns323/F … es_104.txt

The last raid 1 fix was made through 1.03 for the e-mail notifications

http://www.dlink.com/products/support.a … 0#firmware

As for the drives, I do not use WD drives but Seagates:
500 gigs Seagate ST3500320AS

if that can help you

Are you having a particular problem with Raid 1 other than the drives becoming degraded due to hard drive errors?

jayas wrote:
Hi DLINK

Dlink wrote:
In 1.03 drive failure detection was not working properly. It was fixed in 1.04. Alot of the issues we have seen with the Raid degraded status have been related to some firmware drivers on some WD disks causing critical errors which cause our device to flag it as a failed drive. Still a work in progress and hoping to fixed before 1.05 goes public.
I am not sure if the problem is restricted to WD disks. At least I have seen it occur on DNS-323 with Seagate drives although I suspected (don't recall my reasons as it was a while ago) that power interruption may have caused it.

However in investigating at why the hot-plugging fails I found issues with scripts. Is there a channel for issues and suggestions to be relayed to the development team?

Jaya

Just so theres no missunderstanding I am not saying it is strictly related to only WD disks just that we have seen this occur more on WD disks. The closest thing to "official channel" would be forums.dlink.com under the DNS-323 section.

Megistal · 2008-04-25 02:15:51

Hi DLink

Beside degraded drives, not I'm aware of. I didn't had other issue but I didn't test anything with degraded drives. tried once in normal mode which was ok for both drives.

From my preliminary tests on 1.03, I don't have the degraded issue. Need more tests however.

Last edited by Megistal (2008-04-25 02:18:40)

bodbod · 2008-04-25 02:21:02

Hi, I have as well the same problem, the Led of the first HDD is down but the disk is fine and the D-link french support told me first to save the data of both HDD (of course !), then format them again and try if it works.

if not, send it back to them, they will exchange it. I think there is an article on the wiki about this issue

hope this helps

PS: I have 2 Seagate HDD

Last edited by bodbod (2008-04-25 02:22:37)

Megistal · 2008-04-26 16:20:11

After testing, firmware 1.03 does not have the pink/amber light issue behavior on my DNS-323.

Either I'll do like bodbod and contact DLink or wait for 1.05. I believe I'll do the former as for the latter we have no idea when it will come out and IF it will solved or help solving the light issue.

Slingers · 2008-04-27 18:04:15

I have been watching this thread with interest.

I too can confirm that there are serious issues with the 1.04 firmware release.

I am using two matched 500GB Seagate ST3500641AS drives which are not even one year old. I am using RAID 1 mirroring and this appears to be the route of the problem.

After upgrade my DNS 323 “bricked” as it were. The drives continued to report failures, a bit like a disco really, first one then the other then both and so on after successive reboots.

Upon further investigation it appears the two RAIDed disks seem to get stuck in a synchronisation loop. The “sync time” after upgrade was regularly reporting over 7,000 minutes and sometimes reported over 16,000 minutes! The hard disks just keep thrashing and they never end their sync cycle. They then get hotter and hotter and then simply stop responding.

Once in this cycle it is virtually impossible to stop this because at each successive reboot the drives need to sync before you can access them to format them.

After previous upgrades of firmware when using RAID we had been advised to reformat and re-establish the RAID set up. I therefore managed to break the everlasting sync cycle and I have now tried this approach and can report this also failed.

All looked well for the first 13 Hours as I restored my files but it failed on information restore at about 160GB (I am not exactly sure though), when the system refused to accept information and then went dead (even though the lights were showing OK). The only solution was a hard reboot. After this the DNS 323 again seemed to get stuck in its RAID disk syncing cycle again.

Solution – revert back to 1.03.

If ”Dlink” is monitoring I would recommend that you pull this firmware release until you have solved these issues. As you can see people are expending huge amounts of effort and time on this issue and I am sure less capable users are simply loosing important data and literally ending up with an expensive brick; you are in danger of finding loads of your product in dumpsters and people going out and buying another like device from one of your competitors. People do not take data loss kindly.

Also, your firmware, if my experience is anything to go by, is actually causing damage to disks by just simply thrashing them to death trying to synchronise the RAID mirror.

Moderators – in view of all the posts and the issues here could we have this topic as a sticky until D-Link resolve the issue for the benefit of all concerned.

fordem · 2008-04-27 19:59:43

Slingers

I'm not disputing that there are issues with 1.04, however, I'd like to draw your attention to the following post, by Dlink, who I believe is a DLink employee - whether or not his/her participation here is officially sanctioned is a different matter.

Dlink wrote:
In short,

In 1.03 drive failure detection was not working properly. It was fixed in 1.04. Alot of the issues we have seen with the Raid degraded status have been related to some firmware drivers on some WD disks causing critical errors which cause our device to flag it as a failed drive. Still a work in progress and hoping to fixed before 1.05 goes public.

It is difficult to pin point these types of issues since we have limited resources to hard drives. We collect as many different sizes/makes/models as possible but there will always be a few that we do not have access to for testing.

Personally I have long been aware of serious shortcomings in the "drive failure detection" in 1.03 - as pointed out by Dlink, some of the fixes have caused problems for users of RAID1 and certain newer models of disk - my disks are older Seagates (mid 2006), and I have not seen these errors.

The trade off here is, on the one hand - allow ALL users regardless of disk type and whether or not they use RAID to potentially suffer data loss with 1.03 - or, on the other hand - allow what is likely to be a substantially smaller number of users to experience problems with 1.04.

In my opinion - it would be better to suggest that RAID1 not be used rather than to pull the 1.04 firmware.

Two more comments.

The term "brick' has specific connotations - a bricked DNS-323 will do nothing period - being unable to use yours with RAID1 does not prevent it's use, it should therefore not be considered as bricked.

Also - people do not take data loss kindly - agreed - but if they lose data because they have not backed up their DNS-323 (or other device), then that is not the responsibility of D-Link or the manufacturer of whichever device the data was stored on.

Slingers · 2008-04-28 02:10:45

fordem wrote:
Also - people do not take data loss kindly - agreed - but if they lose data because they have not backed up their DNS-323 (or other device), then that is not the responsibility of D-Link or the manufacturer of whichever device the data was stored on.

Thank you for your learned comments....

But surely that is the point of RAID mirroring in that you have an inbuilt back-up of your system which can be recovered. If the fix trashes both hard drives then this is a serious issue and I still stand by my original request. You are advocating a backup of the backup - how many backups is it reasonable for people to make !!

The average user, who we must also think of (and a few have made the same comment here) do not have the capacity to back up again 500 GB of data when they have invested in two 500 GB drives for precisely that purpose. Your definition of "balance" is in my book equal to unnecessary risk and D-Link are risking others data and hardware on the back of the usual IT community limitations of use clause, i.e. "At your own risk"

I am now in the process of restoring my second set of Backed up data - in all a total of just 30 mins short of 24 hours effort without a break, and that is without the time taken to create the original backup copy (which is incremental). Reasonable effort for a firmware upgrade and "balance" - I do not think so!

You are also inferring that because the lights change the firmware can now be trusted to identify if your disk is failing - it's currently registering good disks as failed so my confidence in it performing this task adequately is low, you may be living with a false sense of security.

I agree that if D-Link continue to distribute this grossly faulty firmware than it should at least come with a health waning but currently we don't have that either!

Lastly, my point was that it seems to be an issue with large capacity drives using RAID1, i.e. over the 250-300GB mark, as stated my data restoration failed, this now seems to have occured at the 245GB mark.... Just an observation and trying to provide more info to be fed back to D-Link themselves.

Last edited by Slingers (2008-04-28 02:27:43)

fordem · 2008-04-28 03:54:35

It seems that you, and many of your average users may be labouring under the misconception that having a pair of disks in a RAID1 configuration will keep your data safe - it does NOT - RAID1 is not intended to be a backup - it is intended to provide redundancy in case of disk failure. Because changes to a RAID1 array are written to both disks simultaneously, it provides no protection against data loss due to user error, a virus, or corruption due to an improper shutdown - a true backup will.

You are looking at one problem and I at another - I accept that at present the DNS-323 is not providing the desired (and promised) redundancy with certain disks and version 1.04 firmware - what you refuse to recognize is that even if it was, the user would still have an obligation to backup his/her data - you and your average user would probably be better served by having those 2 x 500 GB disks in a standard configuration, storing data on one, and then backing that one up to the other at intervals of your choice.

Yes - if you want to put that way, I am advocating a backup of a backup - and it is something that I personally do. I run my own small business, my data is stored on a RAID1 array, and backed up to tape on a weekly basis - if I did not have the RAID1 array a disk failure would mean that my business grinds to a halt until such time as I can replace a disk and restore the backup, if I did not have the tape backup, it would mean that a glitch could wipe my data out, even though it's protected by the RAID1 array. I have yet to lose a disk, but I have had to restore from tape.

Oh - one more thing - disk capacity is cheap, the last price I paid for a 500GB SATA disk was $99 + tax, and that's a heck of a lot less than the DDS4 drive I use for backup (a drive is around $875) and that does not include the equivalent capacity in tape cartridges. I choose tape for personal reasons, you and your average user can make your own choices - BUT - choosing RAID1 and not backing up your RAID1 array to another medium, will, sooner or later, result in your data being lost.

RAID1 is for disk redundancy, not data backup, don't confuse the two.

mig · 2008-04-28 06:17:05

Slingers wrote:
Upon further investigation it appears the two RAIDed disks seem to get stuck in a synchronisation loop. The “sync time” after upgrade was regularly reporting over 7,000 minutes and sometimes reported over 16,000 minutes! The hard disks just keep thrashing and they never end their sync cycle. They then get hotter and hotter and then simply stop responding.

Slingers, am I correct in that you did not have any fun_plug running or telnet access to the
DNS-323 while you were testing, and the the "sync time" you are reporting is from
the D-Link web GUI.

Perhaps you could repeat the test, with fun_plug loaded and view the contents of
/proc/mdstat This might give some more insight to what the underlying RAID
software mdadm is doing?

bodbod · 2008-04-29 00:54:13

I agree with Slingers on the fact that if you pay for hardware to a company, I expect it to work correctly (i mean perfectly). It has to do the job it is designed for...
Now, his conception of backing up is maybe not the best but i don't think we all have the same time, specific knowledge and budget to allocate to something which should work properly. As well, i think we all appreciate to have somebody from D-Link participating in this forum in order to have a good/better communication or idea with his team for fixing problems, we can't slash him for that as he has a professional approach on it.
Even if D-link is not engaging its responsibility on a legal point of view by using their own fimware with their own hardware, it does it for me and all the other customers at the brand level.
So please D-link team, continue to dig into the problem's mountain and try to fix them ASAP, this would be much appreciated. I understand this is not a priority for the company as it is already commercialised (maybe too early) but we would be grateful if we could avoid the after-sales-service.
Thx

fordem · 2008-05-01 21:27:51

OK - here's a different theory as to what may cause this repeated random degradation - this theory is born out of reading a post in the D-Link forum (where the user actually linked to this thread). As before this is a theory that I have not been able to test, since I have not personally experienced the problem - at least not with my DNS-323.

Perhaps someone who has, can test it.

The user has mentioned that "after the firmware upgrade, he formats the drives and everything works normally until he powers the unit down - at reboot it may, or may not come up with a degraded indication - BTW - he is using Seagate 7200.11 Barracudas.

First a bit of background - my first 250GB SATA drives were Maxtors, and they worked fine in my desktop, but not in the server I had originally purchased them for - the server never detected them at boot up, and after some searching I discovered that the problem was related to a "delayed spin up" function - the drives did not spin up immediately on power up, but would do so either on a command from the controller or after a random delay - this is done to reduce the startup load on the power supply, by staggering the drive spinup and spreading the current demand over a longer period - in the case of my server, the drives were never ready when the server checked for them, and the solution was a jumper setting that disabled the "delayed spin up" function.

Here's the theory now - is it possible, that, like my server, the DNS-323 is checking for the drive before the drive is operationally ready? It sees one drive, but not the other and so reports a degraded status?

And now - how to test - if you have been experiencing this problem, can you check your drive documentation to see if the drive offers a "delayed spin up" function and if it does, can you disable the function and see if the problem continues.

jayas · 2008-05-02 03:26:28

Hi Fordem,

fordem wrote:
OK - here's a different theory ...

Oh dear, not another theory! Perhaps some hard trouble shooting is the correct way to go. It is quite clear to me (from reading discussions here and from looking at the mount scripts) that the problem IS in the firmware.

It looks like D-LINK is happy to work out what is wrong on its own without help offered ... so lets just sit tight wait to see what 1.05 brings.

Kind regards,

Jaya

DSM-G600, DNS-3xx and NSA-220 Hack Forum

Announcement

#76 2008-04-19 17:40:54

Re: DNS shows 1 hdd degraded after upgrade to 1.04

fordem wrote:

#77 2008-04-19 19:41:57

Re: DNS shows 1 hdd degraded after upgrade to 1.04

jayas wrote:

fordem wrote:

#78 2008-04-20 13:39:17

Re: DNS shows 1 hdd degraded after upgrade to 1.04

fordem wrote:

#79 2008-04-21 04:40:24

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#80 2008-04-23 06:45:40

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#81 2008-04-23 09:00:22

Re: DNS shows 1 hdd degraded after upgrade to 1.04

Basher wrote:

#82 2008-04-23 15:38:27

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#83 2008-04-23 15:55:07

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#84 2008-04-24 03:19:00

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#85 2008-04-24 04:13:18

Re: DNS shows 1 hdd degraded after upgrade to 1.04

jayas wrote:

Basher wrote:

#86 2008-04-24 15:19:34

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#87 2008-04-24 16:36:14

Re: DNS shows 1 hdd degraded after upgrade to 1.04

Dlink wrote:

#88 2008-04-24 20:54:21

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#89 2008-04-24 21:16:11

Re: DNS shows 1 hdd degraded after upgrade to 1.04

Megistal wrote:

jayas wrote:

Dlink wrote:

#90 2008-04-25 02:15:51

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#91 2008-04-25 02:21:02

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#92 2008-04-26 16:20:11

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#93 2008-04-27 18:04:15

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#94 2008-04-27 19:59:43

Re: DNS shows 1 hdd degraded after upgrade to 1.04

Dlink wrote:

#95 2008-04-28 02:10:45

Re: DNS shows 1 hdd degraded after upgrade to 1.04

fordem wrote:

#96 2008-04-28 03:54:35

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#97 2008-04-28 06:17:05

Re: DNS shows 1 hdd degraded after upgrade to 1.04

Slingers wrote:

#98 2008-04-29 00:54:13

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#99 2008-05-01 21:27:51

Re: DNS shows 1 hdd degraded after upgrade to 1.04

#100 2008-05-02 03:26:28

Re: DNS shows 1 hdd degraded after upgrade to 1.04

fordem wrote:

Board footer