Unfortunately no one can be told what fun_plug is - you have to see it for yourself.
You are not logged in.
Ardjan wrote:
The Raid1 says 'degraded', the JBOD is ok. Both are working, as far as I can see. In fact, the only error I see is the white LED, nothing else yet...
In relation to RAID 1, even when one disk goes down, the mounted shares will function normally. This is after all the core functionality provided by mirroring.
Today I finally managed it to test the drive thorougly with the WD provided test-program. No SMART issues displayed, no bed sectors, nothing that points to an error.
This IMO rules out drive problem. Would be interested to know if you get email alerts though when you have the time to set it up. The 'test' button provides a convenient way of checking out if DNS-323 can email you.
Excerpt of dmesg.out:
Here is my equivalent excerpt (which does not have amber/white light problem) for comparision purpos:
~ # dmesg <lost lines off the top owing to buffer wraparound> Vendor: SAMSUNG Model: HD501LJ Rev: CR10 Type: Direct-Access ANSI SCSI revision: 03 Vendor: SAMSUNG Model: HD501LJ Rev: CR10 Type: Direct-Access ANSI SCSI revision: 03 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0, type 0 physmap flash device: 800000 at ff800000 phys_mapped_flash: Found 1 x16 devices at 0x0 in 8-bit bank Amd/Fujitsu Extended Query Table at 0x0040 number of CFI chips: 1 cfi_cmdset_0002: Disabling erase-suspend-program due to code brokenness cmdlinepart partition parsing not available RedBoot partition parsing not available Using physmap partition definition Creating 5 MTD partitions on "phys_mapped_flash": 0x00000000-0x00010000 : "MTD1" 0x00010000-0x00020000 : "MTD2" 0x00020000-0x001a0000 : "Linux Kernel" 0x001a0000-0x007d0000 : "File System" 0x007d0000-0x00800000 : "u-boot" ehci_platform ehci_platform.20865: EHCI Host Controller ehci_platform ehci_platform.20865: new USB bus registered, assigned bus ehci_platform ehci_platform.20865: irq 17, io mem 0x00000000 ehci_platform ehci_platform.20865: park 0 ehci_platform ehci_platform.20865: USB 0.0 initialized, EHCI 1.00, driv 2004 hub 1-0:1.0: USB hub found hub 1-0:1.0: 1 port detected ehci_platform ehci_platform.86401: EHCI Host Controller ehci_platform ehci_platform.86401: new USB bus registered, assigned bus ehci_platform ehci_platform.86401: irq 12, io mem 0x00000000 ehci_platform ehci_platform.86401: park 0 ehci_platform ehci_platform.86401: USB 0.0 initialized, EHCI 1.00, driv 2004 hub 2-0:1.0: USB hub found hub 2-0:1.0: 1 port detected ohci_hcd: 2004 Nov 08 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI usbcore: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver mice: PS/2 mouse device common for all mice md: linear personality registered as nr 1 md: raid0 personality registered as nr 2 md: raid1 personality registered as nr 3 md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27 device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.co NET: Registered protocol family 2 IP: routing cache hash table of 512 buckets, 4Kbytes TCP established hash table entries: 4096 (order: 3, 32768 bytes) TCP bind hash table entries: 4096 (order: 2, 16384 bytes) TCP: Hash tables configured (established 4096 bind 4096) NET: Registered protocol family 1 NET: Registered protocol family 17 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. RAMDISK: Compressed image found at block 0 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended VFS: Mounted root (ext2 filesystem). Freeing init memory: 112K MINIX-fs: mounting unchecked file system, running fsck is recommended. MINIX-fs: mounting unchecked file system, running fsck is recommended. MINIX-fs: mounting unchecked file system, running fsck is recommended. SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) SCSI device sda: drive cache: write back SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB) SCSI device sdb: drive cache: write back SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB) SCSI device sdb: drive cache: write back sdb: sdb1 sdb2 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 egiga0: mac address changed MINIX-fs: mounting unchecked file system, running fsck is recommended. egiga0: link down egiga0: link up<5>, full duplex<5>, speed 100 Mbps<5> Adding 530104k swap on /dev/sda1. Priority:-1 extents:1 Adding 530104k swap on /dev/sdb1. Priority:-2 extents:1 MINIX-fs: mounting unchecked file system, running fsck is recommended. ext3: No journal on filesystem on sda2 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended ext3: No journal on filesystem on sdb2 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended MINIX-fs: mounting unchecked file system, running fsck is recommended. MINIX-fs: mounting unchecked file system, running fsck is recommended. md: md0 stopped. md: bind<sdb2> md: bind<sda2> raid1: raid set md0 active with 2 out of 2 mirrors EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended Link Layer Topology Discovery Protocol, version 1.05.1223.2005 dev is <NULL>
It does not look to me like there is any difference of significance (difference even) between the two logs on startup. Can anyone else see anything of significance?
Perhaps we need to look at what happened after this startup? Here is mine ...
dev is <NULL> md: mdadm(pid 1358) used obsolete MD ioctl, upgrade your software to use new icls. <more of the same deleted> md: mdadm(pid 1568) used obsolete MD ioctl, upgrade your software to use new icls. *************************************** * HD1 stand by now! * *************************************** *************************************** * HD0 stand by now! * *************************************** ####################################### # HD0 awake now ! # ####################################### ####################################### # HD1 awake now ! # ####################################### md: mdadm(pid 3580) used obsolete MD ioctl, upgrade your software to use new icls. <more of the same deleted> md: mdadm(pid 3685) used obsolete MD ioctl, upgrade your software to use new icls. *************************************** * HD1 stand by now! * *************************************** *************************************** * HD0 stand by now! * *************************************** ~ #
If you get anything else reported, that could hold the clue as to what is going on. I am quite sure though that in your case, it does not appear to be a drive problem at all.
By the way, what is your hardware revision? I am quite sure version 1.04 firmware I downloaded was for revision B1 hardware, and I am inferring there is another version for revision A1 hardware?
Having said this, I am running the same firmware on another box with revision A1 hardware and I did it by formatting the disks using 1.03 and upgrading to 1.04 without reformatting the disks because I did not like the extra partition. It has Seagate disks and is running A ok.
Hmm, what do these ext3 references mean in the last few lines? I thought that ext3 was not in the >1.02 firmware anymore?
Looks like they did away with ext3 partition, but forgot to remove the code that checks for it.
Jaya
Offline
jayas wrote:
In relation to RAID 1, even when one disk goes down, the mounted shares will function normally. This is after all the core functionality provided by mirroring.
Of course it does. I just meant that also the JBOD is working normally, and that is _not_ the fact when really one drive fails.
jayas wrote:
This IMO rules out drive problem. Would be interested to know if you get email alerts though when you have the time to set it up.
E-mail works: "Left Hard Drive Has Failed. Sincerely, your DNS-323" :-) (btw: it would be nice if it told me _which_ DNS323 it was. When there are several DNS'es (not in my case), this mail doesn't tell anything. The name and IP would be nice to know...)
jayas wrote:
Here is my equivalent excerpt (which does not have amber/white light problem) for comparision purpos:
Code:
~ # dmesg ...<snip>... Adding 530104k swap on /dev/sda1. Priority:-1 extents:1 Adding 530104k swap on /dev/sdb1. Priority:-2 extents:1 MINIX-fs: mounting unchecked file system, running fsck is recommended. ext3: No journal on filesystem on sda2 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended ext3: No journal on filesystem on sdb2 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended MINIX-fs: mounting unchecked file system, running fsck is recommended. MINIX-fs: mounting unchecked file system, running fsck is recommended. md: md0 stopped. md: bind<sdb2> md: bind<sda2> raid1: raid set md0 active with 2 out of 2 mirrors ...<snip>...It does not look to me like there is any difference of significance (difference even) between the two logs on startup. Can anyone else see anything of significance?
Hmm, no mentioning of 'MINIX' in my dmesg.out, and the raid is broken due to
md: md0 stopped. md: bind<sdb2> md: bind<sda2> md: kicking non-fresh sdb2 from array! md: unbind<sdb2> md: export_rdev(sdb2) raid1: raid set md0 active with 1 out of 2 mirrors
jayas wrote:
Perhaps we need to look at what happened after this startup? Here is mine ...
That doesn't look different than mine...
As far as I can see the DNS just kicks the partition from the raid and goes on. That explains why the JBOD keeps working. I think I have to look deeper into this...
jayas wrote:
If you get anything else reported, that could hold the clue as to what is going on. I am quite sure though that in your case, it does not appear to be a drive problem at all.
By the way, what is your hardware revision? I am quite sure version 1.04 firmware I downloaded was for revision B1 hardware, and I am inferring there is another version for revision A1 hardware?
Having said this, I am running the same firmware on another box with revision A1 hardware and I did it by formatting the disks using 1.03 and upgrading to 1.04 without reformatting the disks because I did not like the extra partition. It has Seagate disks and is running A ok.
Hmm, so the drive seems to be OK. That's a relief.
My Hardware revision is A1.
I upgraded it from 1.03 to 1.04 by copying all data somewhere else, then did the upgrade. All data was still there, and no white light was seen. The white appeared some days later for the first time. I switch the drive off every night, because of the noise. Mostly I use the webpage-controls for the switch-off.
After I noticed the white led, I updated my external USB-HD with the changed data on the NAS and reformatted the drive (I wanted to change the size of the RAID1/JBOD volumes). It was ok for another few days...
Offline
Ardjan wrote:
E-mail works: "Left Hard Drive Has Failed. Sincerely, your DNS-323" :-) (btw: it would be nice if it told me _which_ DNS323 it was. When there are several DNS'es (not in my case), this mail doesn't tell anything. The name and IP would be nice to know...)
By making the sender e-mail field unique, you can determine which DNS-323 sent the email alert.
Offline
Hi Ardjan.
I must have been half asleep when I responded to your earlier post! You are right, the downgrading is logged in dmesg.
Someone here who is familiar with mdadm could perhaps indicate how to follow up and check the status of the drives. I have yet to play with mdadm.
As to your other point, if you have multiple DNS-323's, you can tell which one sent you the alert by looking at the subject line. At least this was the case in 1.03 and I have not checked it out on 1.04 yet.
Jaya
Offline
jayas wrote:
Someone here who is familiar with mdadm could perhaps indicate how to follow up and check the status of the drives. I have yet to play with mdadm.
That seems to be the best method. So, Volunteers? :-)
jayas wrote:
As to your other point, if you have multiple DNS-323's, you can tell which one sent you the alert by looking at the subject line. At least this was the case in 1.03 and I have not checked it out on 1.04 yet.
ok, you're right. I didn't check the subject-header, the name is there (shame on me...)
Offline
Ardjan wrote:
...That seems to be the best method. So, Volunteers? :-)
http://www.linuxdevcenter.com/pub/a/lin … /RAID.html
This is the output of the mdadm examine command on my working FW1.03 RAID1
# mdadm -E /dev/sdb2 /dev/sdb2: Magic : a92b4efc Version : 00.90.00 UUID : 8d42d38f:c4545df1:1258c200:ba97d336 Creation Time : Fri Dec 21 00:46:40 2007 Raid Level : raid1 Device Size : 242227968 (231.01 GiB 248.04 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 4 08:50:24 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : 657628ed - correct Events : 0.851489 Number Major Minor RaidDevice State this 0 8 18 0 active sync /dev/sdb2 0 0 8 18 0 active sync /dev/sdb2 1 1 8 2 1 active sync /dev/sda2
Last edited by mig (2008-03-04 20:16:05)
Offline
Hi Ardjan,
Further to mig's contribution, I found the following link useful:
http://man-wiki.net/index.php/8:mdadm
and for starters, details of my MD device:
~ # mdadm -D /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Wed Feb 13 17:11:32 2008 Raid Level : raid1 Array Size : 486544512 (464.01 GiB 498.22 GB) Device Size : 486544512 (464.01 GiB 498.22 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Mar 5 00:25:27 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : db618e2e:36b479ce:7d23417b:cb48b479 Events : 0.210985 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2
Examination of /dev/sd[ab]2:
~ # mdadm -E /dev/sd[ab]2 /dev/sda2: Magic : a92b4efc Version : 00.90.00 UUID : db618e2e:36b479ce:7d23417b:cb48b479 Creation Time : Wed Feb 13 17:11:32 2008 Raid Level : raid1 Device Size : 486544512 (464.01 GiB 498.22 GB) Array Size : 486544512 (464.01 GiB 498.22 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Wed Mar 5 00:25:27 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : b033c662 - correct Events : 0.210985 Number Major Minor RaidDevice State this 0 8 2 0 active sync /dev/sda2 0 0 8 2 0 active sync /dev/sda2 1 1 8 18 1 active sync /dev/sdb2 /dev/sdb2: Magic : a92b4efc Version : 00.90.00 UUID : db618e2e:36b479ce:7d23417b:cb48b479 Creation Time : Wed Feb 13 17:11:32 2008 Raid Level : raid1 Device Size : 486544512 (464.01 GiB 498.22 GB) Array Size : 486544512 (464.01 GiB 498.22 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Wed Mar 5 00:25:27 2008 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : b033c674 - correct Events : 0.210985 Number Major Minor RaidDevice State this 1 8 18 1 active sync /dev/sdb2 0 0 8 2 0 active sync /dev/sda2 1 1 8 18 1 active sync /dev/sdb2
If you put up your results, perhaps we can have a go at using some of other commands to trigger failure and reconstruction to see what happens.
Jaya
Offline
For now the problem seems to be solved. I copied all data back to the Mac, reformatted the drive as Raid-1 (did not do that after upgrading to 1.04) and copied everything to the DNS again.
Lights are normal now! Hope it will stay this way.
Just curious: for everyone who has the "failed drive" problem: Did you format the drive after the upgrade?
Offline
In my case I've formated the drives after upgrading to 1.04 and the drives still got degraded.
Offline
Speijk wrote:
Just curious: for everyone who has the "failed drive" problem: Did you format the drive after the upgrade?
I have got three boxes, two of them B1 and one A1 hardware revisions; the two with Samsung and the one with Seagate drives. All of them were formated using 1.03 and then upgraded to 1.04 without reformatting. None of these have developed amber/white light or "failed drive" problem.
Jaya
Last edited by jayas (2008-03-05 12:48:13)
Offline
Can anyone experiencing the degraded light please respond with their harddrive information. Model/Make/Version/Firmware etc. Any information would help.
Offline
Cazaril wrote:
could it be a combination of firmware and HDDs? Note that Arjan has the white LED and degraded disk after upgrading to 1.04. Also, I recall he also had 400 GB WD disks.
As being told:
1. Some WD HDDs have a critical bug with its F/W, there may cause the HDD disappears suddenly. A f/w upgrade with some models seems can extend the time of being disappeared, but seems the bug still there.
2. The 1.04 f/w of 323 has added a capability of checking the HDD status real time, so when it detects the above issue, which shows a "DEGRADED" status of the Raid1 mode, the LED goes AMBER to indicate this situation, that is a HDD failure occurs. But, with the 1.03 f/w, it doesn't has this ability, so although one of the HDD goes wrong, the LED remains unchanged, it just shows BLUE. And why it still can work properly in this situation, because it should work with only one drive in Raid1 mode even one of the drive gone.
I think, maybe this is a good explanation of this weird problem.
Offline
Ardjan wrote:
jayas wrote:
In relation to RAID 1, even when one disk goes down, the mounted shares will function normally. This is after all the core functionality provided by mirroring.
Of course it does. I just meant that also the JBOD is working normally, and that is _not_ the fact when really one drive fails.
jayas wrote:
This IMO rules out drive problem. Would be interested to know if you get email alerts though when you have the time to set it up.
E-mail works: "Left Hard Drive Has Failed. Sincerely, your DNS-323" :-) (btw: it would be nice if it told me _which_ DNS323 it was. When there are several DNS'es (not in my case), this mail doesn't tell anything. The name and IP would be nice to know...)
jayas wrote:
Here is my equivalent excerpt (which does not have amber/white light problem) for comparision purpos:
Code:
~ # dmesg ...<snip>... Adding 530104k swap on /dev/sda1. Priority:-1 extents:1 Adding 530104k swap on /dev/sdb1. Priority:-2 extents:1 MINIX-fs: mounting unchecked file system, running fsck is recommended. ext3: No journal on filesystem on sda2 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended ext3: No journal on filesystem on sdb2 EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended MINIX-fs: mounting unchecked file system, running fsck is recommended. MINIX-fs: mounting unchecked file system, running fsck is recommended. md: md0 stopped. md: bind<sdb2> md: bind<sda2> raid1: raid set md0 active with 2 out of 2 mirrors ...<snip>...It does not look to me like there is any difference of significance (difference even) between the two logs on startup. Can anyone else see anything of significance?
Hmm, no mentioning of 'MINIX' in my dmesg.out, and the raid is broken due to
Code:
md: md0 stopped. md: bind<sdb2> md: bind<sda2> md: kicking non-fresh sdb2 from array! md: unbind<sdb2> md: export_rdev(sdb2) raid1: raid set md0 active with 1 out of 2 mirrorsAccording to this message, one of your drive is failure
jayas wrote:
Perhaps we need to look at what happened after this startup? Here is mine ...
That doesn't look different than mine...
As far as I can see the DNS just kicks the partition from the raid and goes on. That explains why the JBOD keeps working. I think I have to look deeper into this...jayas wrote:
If you get anything else reported, that could hold the clue as to what is going on. I am quite sure though that in your case, it does not appear to be a drive problem at all.
By the way, what is your hardware revision? I am quite sure version 1.04 firmware I downloaded was for revision B1 hardware, and I am inferring there is another version for revision A1 hardware?
Having said this, I am running the same firmware on another box with revision A1 hardware and I did it by formatting the disks using 1.03 and upgrading to 1.04 without reformatting the disks because I did not like the extra partition. It has Seagate disks and is running A ok.Hmm, so the drive seems to be OK. That's a relief.
My Hardware revision is A1.
I upgraded it from 1.03 to 1.04 by copying all data somewhere else, then did the upgrade. All data was still there, and no white light was seen. The white appeared some days later for the first time. I switch the drive off every night, because of the noise. Mostly I use the webpage-controls for the switch-off.
After I noticed the white led, I updated my external USB-HD with the changed data on the NAS and reformatted the drive (I wanted to change the size of the RAID1/JBOD volumes). It was ok for another few days...
Offline
Wilson wrote:
2. The 1.04 f/w of 323 has added a capability of checking the HDD status real time, so when it detects the above issue, which shows a "DEGRADED" status of the Raid1 mode, the LED goes AMBER to indicate this situation, that is a HDD failure occurs. But, with the 1.03 f/w, it doesn't has this ability, so although one of the HDD goes wrong, the LED remains unchanged, it just shows BLUE. And why it still can work properly in this situation, because it should work with only one drive in Raid1 mode even one of the drive gone.
I think, maybe this is a good explanation of this weird problem.
Hi Wilson,
With 1.04, I get the amber light and email alert the after hot plugging out a drive. I do not believe I got this in 1.03 when I did this test. So what you say about 1.04 as "checking the HDD status real time" makes sense.
Having said this, I believe that they got this "checking status" wrong and this is why people are reporting good drives as being reported bad by DNS-323. I don't think the code that does this is released by D-LINK. If they did, I am sure it would not be hard to find the culprit and fix the problem.
In the interim, I suggest manually using mdadm to fault the drive and then add the drive back to the array to trigger resync.
Jaya
Last edited by jayas (2008-03-06 15:20:33)
Offline
Hello,
Has anyone properly recovered from the amber/white light problem? If not, here is something you could try.
First determine which drive is having the problem. The one on the right is /dev/sda2. and the one on the left is /dev/sdb2. The relevant entries in dmesg will confirm this:
md: md0 stopped. md: bind<sdb2> md: bind<sda2> md: kicking non-fresh sdb2 from array! md: unbind<sdb2> md: export_rdev(sdb2)
which in this case is /dev/sdb2.
Now TELNET/SSH to the DNS-323, and then try recovering using the following commands:
mdadm /dev/md0 -f /dev/sdb2 # signal as faulty mdadm /dev/md0 -r /dev/sdb2 # remove from array mdadm /dev/md0 -a /dev/sdb2 # add to the array
If you look at the status page after this, you will find that it should say that the sync operation is in progress. When it is complete, the status page will show all is okay, but the amber/white light may not go off. Rebooting DNS-323 gets all things back to normal with no more white/amber light.
Hope this helps.
Jaya
Offline
jayas wrote:
Now TELNET/SSH to the DNS-323, and then try recovering using the following commands:
Code:
mdadm /dev/md0 -f /dev/sdb2 # signal as faulty mdadm /dev/md0 -r /dev/sdb2 # remove from array mdadm /dev/md0 -a /dev/sdb2 # add to the arrayIf you look at the status page after this, you will find that it should say that the sync operation is in progress. When it is complete, the status page will show all is okay, but the amber/white light may not go off. Rebooting DNS-323 gets all things back to normal with no more white/amber light.
I just tried it. It reported on the first two commands (-f and -r) that the sdb2 wasn't available, but started the sync at the -a command. After some 20 minutes it went into 'degraded' again (remaining time at start ~120minutes).
Since I use the DNS only as a asynchronous raid and fileserver for my laptop (there is no important data on the DNS which I don't have on my PC), and my pc has a raid1 built-in anyway, it is not a very big issue. I will wait for a new firmware, I think.
Offline
@ardjan - if you still have a bad drive led indication (amber/orange/white) or 'degraded' raid in
the web gui, could you post the output of
mdadm -E /dev/sda2
and
mdadm -E /dev/sdb2
I found some documentation http://linux-raid.osdl.org/index.php/Non-fresh which describes
a "non-fresh" raid array member as a drive which is out of sync with the other(s). The
'Events:" property, listed with the mdadm -E command, would show this status.
Offline
mig wrote:
could you post the output of
Here it is:
/ # mdadm -E /dev/sda2 /dev/sda2: Magic : a92b4efc Version : 00.90.00 UUID : d84b5882:94bf85bb:aea4d949:610a3a4a Creation Time : Sat Feb 2 20:49:48 2008 Raid Level : raid1 Device Size : 341838976 (326.00 GiB 350.04 GB) Array Size : 341838976 (326.00 GiB 350.04 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Update Time : Tue Mar 11 21:43:00 2008 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 2 Spare Devices : 0 Checksum : c9c6a2c5 - correct Events : 0.183884 Number Major Minor RaidDevice State this 0 8 2 0 active sync /dev/sda2 0 0 8 2 0 active sync /dev/sda2 1 1 0 0 1 faulty removed
and
/ # mdadm -E /dev/sdb2 mdadm: No md superblock detected on /dev/sdb2.
As I said before: the RAID1 volume and the JBOD Volume can be accessed normally over the network...
Offline
Hi Ardjan,
Sorry if I am asking you to repeat what you have already done, but your situation is baffling and I like to have a go at nailing it down. Starting afresh, here is what I like you to do and report if you can:
1/ TELNET and do add the drive back to the raid thus:
mdadm /dev/md0 -a /dev/sdb2
2/ Dump the relevant information using dmesg
3/ Wait for sync to complete and then repeat this
[All of the above should be done without rebooting the device.]
4/ If after step 3 sdb2 does not get ejected from array, reboot and look at dmesg again.
Regards,
Jaya
Offline
Ardjan wrote:
As I said before: the RAID1 volume and the JBOD Volume can be accessed normally over the network...
Thanks for the output, I was hoping it would show different numbers in the "Event" property,
to correlate that with the 'md: kicking non-fresh sdb2 from array!' error message, but the
output shows another situation.
The /dev/sdb2 partition can not join the raid because it does not have a RAID (md) superblock. The RAID1
array you can access from the network is not a functioning RAID1 at all. You are saving data to
just one disk (/dev/sda2). This is the same as not running a RAID at all.
To write a md superblock, I believe (I'm not a guru on 'md') you need to create the raid again
something like: mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2.
WAIT... before you try that command, backup all your data. I don't know if you can create a raid
and preserve any data that exist already on the disks.
If you gonna have to backup and restore data, you are probably better off to just wipe the disks
and let the DNS-323 web gui setup the disks again.
Offline
Hi Mig,
mig wrote:
The /dev/sdb2 partition can not join the raid because it does not have a RAID (md) superblock.
I am not sure if this is a entirely correct given it is possible to add a blank disk to an existing RAID array, and this happens without loss of data.
From memory, I recall I had to fdisk, mke2fs and then use mdadm to add to the array. Let me verify the procedure and post it here. In my case I did not want the WEB GUI do it because I wanted to partition the disk using 1.03 scheme (with only two partitions) instead of 1.04 scheme which has three (and out of sequence) partitions.
Regards,
Jaya
Edit PS: In Ardjan's case, the add to RAID seems to work, and it proceeds to sync, but somehow after sync is complete, it drops out again.
Last edited by jayas (2008-03-12 09:27:11)
Offline
jayas wrote:
In Ardjan's case, the add to RAID seems to work, and it proceeds to sync, but somehow after sync is complete, it drops out again.
Perhaps there is a problem with the drive that caused the original failure indication, if this is not resolved I would expect all attempts at resyncing to fail also.
Offline
jayas wrote:
I am not sure if this is a entirely correct given it is possible to add a blank disk to an existing RAID array, and this happens without loss of data.
You're right, I could be mistaken. I guess it depends on understanding where in the mdadm
process of -add | -create | -assemble the md superblock gets created and saved. The other thing
that complicates this, is Ardjan has a JBOD partition, too. And I'm not exactly clear how
that JBOD configuration is setup on the disks along with a RAID1.
Last edited by mig (2008-03-12 17:33:53)
Offline
Hi Fordem and Mig
fordem wrote:
Perhaps there is a problem with the drive that caused the original failure indication, if this is not resolved I would expect all attempts at resyncing to fail also.
You are probably right in suspecting a problem in the drive as in perhaps its partition table is suspect. Ardjan says the physical drive checks out all right with diagnostics.
mig wrote:
The other thing that complicates this, is Ardjan has a JBOD partition, too. And I'm not exactly clear how that JBOD configuration is setup on the disks along with a RAID1.
If we can have the partition info, it would help. For example, for me here is what I get:
~ # cat /proc/partitions major minor #blocks name 7 0 5512 loop0 31 0 64 mtdblock0 31 1 64 mtdblock1 31 2 1536 mtdblock2 31 3 6336 mtdblock3 31 4 192 mtdblock4 9 0 486544512 md0 8 0 488386584 sda 8 1 530113 sda1 8 2 486544590 sda2 8 16 488386584 sdb 8 17 530113 sdb1 8 18 486544590 sdb2
Now I don't like throwing spanners in the works, but here is something worth considering. Installing funplug can cause a few insidious problems with the scripts. For example I found that scripts in DNS-323 work with sh in BusyBox v1.00-pre1 (1.04 firmware) but fail with sh from BusyBox v1.8.1 (funplug-0.4).
Thus I have modified my (minimal) TELNET interface to use BusyBox v1.8.1 from /home/root so as not to interfere with /bin/sh or /bin/busybox. I needed this to enable telnetd to use /bin/login and also to better utilities like fdisk, which for my two drives are:
~ # /home/root/busybox fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 66 530113+ 82 Linux swap /dev/sda2 67 60638 486544590 83 Linux ~ # /home/root/busybox fdisk -l /dev/sdb Disk /dev/sdb: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 66 530113+ 82 Linux swap /dev/sdb2 67 60638 486544590 83 Linux
The equivalent results in Ardjan's case may hold the clue to the elusive amber/white light problem.
I do not have the amber/white light problem but I was able to create it and resolve it as follows:
1/ Hot plug out right side drive and wait for amber light to come on.
2/ Hot plug in the drive and wait till the amber light goes off. Yes it does go off.
3/ Reboot DNS-323 and wait for amber light to come back on.
At this point the situation is similar that for Ardjan where the system reports a disk has failed and downgrades, but does not provide any option to resolve the problem without having to reformat both disks.
4/ Through TELNET add the drive using mdadm -a /dev/md0 /dev/sda2 and wait for the restoration to complete.
5/ Reboot DNS-323 and all went back to normal operation.
Sorry for the long post ... but I hope you can see what I am getting at: it is possible to create a scenario at will that DNS-323 with 1.04 firmware is not able to recover from without losing data on the surviving member of raid array.
Jaya
Offline
i am having the same problem today. amber light on but all data still accessible.
This is the status shown in web interface:
HARD DRIVE INFO :
Total Drive(s): 2
Volume Name: Volume_1
Volume Type: RAID 1
Sync Time Remaining: Degraded
Total Hard Drive Capacity: 394227 MB
Used Space: 154619 MB
Unused Space: 239607 MB
Volume Name: Volume_2
Volume Type: JBOD
Total Hard Drive Capacity: 193951 MB
Used Space: 84568 MB
Unused Space: 109382 MB
Here is output of the commands advise above:
/ # mdadm -E /dev/sda2
/dev/sda2:
Magic : a92b4efc
Version : 00.90.00
UUID : 393a5bca:511f540a:1ac12ea8:4a1abb7e
Creation Time : Tue Feb 26 07:51:58 2008
Raid Level : raid1
Device Size : 391126400 (373.01 GiB 400.51 GB)
Array Size : 391126400 (373.01 GiB 400.51 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Thu Mar 13 07:11:34 2008
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
Checksum : 3f9414dd - correct
Events : 0.2279930
Number Major Minor RaidDevice State
this 0 8 2 0 active sync /dev/sda2
0 0 8 2 0 active sync /dev/sda2
1 1 0 0 1 faulty removed
2 2 8 18 1 spare /dev/sdb2
/ # mdadm -E /dev/sdb2
/dev/sdb2:
Magic : a92b4efc
Version : 00.90.00
UUID : 393a5bca:511f540a:1ac12ea8:4a1abb7e
Creation Time : Tue Feb 26 07:51:58 2008
Raid Level : raid1
Device Size : 391126400 (373.01 GiB 400.51 GB)
Array Size : 391126400 (373.01 GiB 400.51 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Thu Mar 13 07:13:26 2008
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
Checksum : 3f9417d0 - correct
Events : 0.2280244
Number Major Minor RaidDevice State
this 2 8 18 2 spare /dev/sdb2
0 0 8 2 0 active sync /dev/sda2
1 1 0 0 1 faulty removed
2 2 8 18 2 spare /dev/sdb2
/ # cat /proc/partitions
major minor #blocks name
7 0 5508 loop0
31 0 64 mtdblock0
31 1 64 mtdblock1
31 2 1536 mtdblock2
31 3 6336 mtdblock3
31 4 192 mtdblock4
9 0 391126400 md0
8 0 488386584 sda
8 1 530113 sda1
8 2 391126522 sda2
8 3 96213285 sda3
8 4 514080 sda4
8 16 488386584 sdb
8 17 530113 sdb1
8 18 391126522 sdb2
8 19 96213285 sdb3
8 20 514080 sdb4
9 1 192426368 md1
I have just executed below:
mdadm /dev/md0 -f /dev/sdb2 # signal as faulty
mdadm /dev/md0 -r /dev/sdb2 # remove from array
mdadm /dev/md0 -a /dev/sdb2 # add to the array
Now I see below message from Web Interface:
The RAID volume is synchronizing now. Please wait for 6294.7 minute(s).
Refresh webpage and I got this:
The RAID volume is synchronizing now. Please wait for 6490.4 minute(s).
Why the minutes decreasing but will increase again, is this normal ?
The RAID volume is synchronizing now. Please wait for 1766.3 minute(s).
The RAID volume is synchronizing now. Please wait for 6476.3 minute(s).
Increased again
I have attached dmesg.out after turned to amber light
Last edited by philipcs (2008-03-13 18:15:35)
Offline