Unfortunately no one can be told what fun_plug is - you have to see it for yourself.
You are not logged in.
Pages: 1
Hello,
I have a DNS-323 running firmware 1.05 configured for RAID 1 that reports a degraded RAID 1 status. Both LEDs are blue but the RHS (on front panel) LED rarely flashes (if at all) indicating rare disk access.
I wanted to check the state/health/status of the hard disks primarily to ascertain if the firmware was reporting erroneous or incompletely. So I started googling, installed the basic fun_plug support that lets me ssh into the box. At this point I would like to examine the disks and realize that I can get alot of information from /proc. However I cannot locate procinfo. Anyways I continued googling and eventually decided to run e2fsck. It reports the filesystem as unhealthy as indicated below.
I am a noob when it comes to linux so I would appreciate it if somebody could help by either pointing me to some resource that explains how to get this info on the DNS-323 or advising me how to proceed. I wish to ascertain the state of the disks and then decide what to do in order to reach a healthy RIAD 1 state again.
Any help is immensely appreciated!
- m
root@AAAdlink-8D50AF:/# e2fsck -n /dev/md0 e2fsck 1.41.0 (10-Jul-2008) Warning! /dev/md0 is mounted. /dev/md0 contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity /lost+found not found. Create? no Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: +537 -8192 Fix? no Free blocks count wrong (18508082, counted=18501421). Fix? no Free inodes count wrong (38783330, counted=38779735). Fix? no /dev/md0: ********** WARNING: Filesystem still has errors ********** /dev/md0: 30366/38813696 files (10.5% non-contiguous), 59087854/77595936 blocks root@AAAdlink-8D50AF:/#
Offline
Take a look at http://tldp.org/HOWTO/Software-RAID-HOWTO-6.html for mdadm commands
try:
cat /proc/mdstat
mdadm --detail /dev/md0
Offline
mig wrote:
Take a look at http://tldp.org/HOWTO/Software-RAID-HOWTO-6.html for mdadm commands
try:
cat /proc/mdstat
mdadm --detail /dev/md0
mig -- thanks alot for the help. I tried what you suggested and got the following:
root@AAAdlink-8D50AF:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] md0 : active raid1 sdb2[1] 310383744 blocks [2/1] [_U] unused devices: <none> root@AAAdlink-8D50AF:~# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Sun Mar 23 14:27:32 2008 Raid Level : raid1 Array Size : 310383744 (296.01 GiB 317.83 GB) Device Size : 310383744 (296.01 GiB 317.83 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Jan 16 17:20:12 2009 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : bab883cd:ead48626:e62ff2d5:b46404b5 Events : 0.372520 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2
It seems the system thinks it is configured for 2 devices but only sees 1. I suppose either the disk actually failed else it needs to be reset (vibrations wiggled it out - doubt it). I am thinking of powering down and taking out the supposed failed disk which seems to the be device number 0. Does anybody know which slot it maps to? I suppose it is likely the RHS one (non-blinking LED) but I would like to be sure.
Thanks so much!
--m
Offline
Yes, it looks like mdadm has 'removed' device 0 from your RAID.
Before you do anything else, make a backup of your data (if you don't already have one)!
The RHS is device 0. (the drive on the power port side of the device, [drive on the USB side is device 1])
Take a look at this image of the circuit board, HDD0 is device 0 http://dns323.kood.org/_detail/11.jpg?i … ache=cache
Given the status of the LEDs, you could have a dead drive.
The drive manufacture's web site should have a downloadable diagnostic tool to
determine drive health.
Offline
The following is based on personal observation ...
A "dead" drive - as in not detected at all - should show as either no LED or an amber LED - exactly which one seems to depend on the firmware version and drive configuration - more on this below. A drive that is detected but does not come ready is indicated by a continuously flashing blue LED (the same flashing that you see at startup and when the drives are spooling up) - the fact that there is a steady blue LED suggests that the drive is good - or at least it has come ready.
In case anyone needs an explanation - the drive reports a 'ready status' to the controller when it's upto the correct rpm and the heads have move to the home position, it is entirely possible for it to be ready and be defective as in have a large enough number of defective sectors that it is unuseable.
My interpretation of the LEDs and the degraded status here is that even though the right drive comes ready, the unit "thinks" there is a problem with it, hence the degraded status, and has stopped writing to it, resulting in no disk access.
This can be verified by removing the right drive and attempting to read the data off of it using a linux system of Windows and the ex2ifs driver - I would expect data to be present, but not current - as in files written to the unit after the "failure" will be found on the left drive but not on the right one, files written before the "failure" will be on both.
I believe this condition can be "recreated" at will by forcing a drive failure simulation (hot unplug a drive) and then power the unit down and re-inserting the removed drive - so the original poster's theory about it vibrating out may have some merit.
As mentioned earlier, the "dead" drive indications seem to vary with the firmware version and the drive configuration at the time of failure.
A single volume, two separate volumes & JBOD will get an amber failure indication only if the drive fails whilst the unit is running - if you reboot the unit the amber light will go off and stay off. If the drive fails whilst the unit is off or at power on, your only indication will be no drive LED and no data.
With a RAID configuration and firmware prior to 1.06 the indications are similar to those above, but with 1.06 it changes, you will get an amber LED on a reboot rather than just no LED as above and pre 1.06.
Unfortunately I cannot explain the white/pink/purple indications - I will say that anytime I have seen these it was through "provocation" as in caused by something I did, such as hot plugging a drive - my interpretation is that the unit thinks there is a problem with the drive and so turns on the amber LED whilst the blue LED is on.
Offline
If you install the smartmon package, you might also be able to run some SMART tests:
smartctl -d marvell -t short /dev/sda #(a couple of minutes)
smartctl -d marvell -t long /dev/sda #takes a long time
I have never run any of these since mine have passed the quick test (-H), but I'd be running them if I suspected trouble.
Offline
All,
Thank you for your help. After spending hours on this issue here is my report. I bought a SATA-to-USB bridge and attached the "failed" drive to my Mac and examined it using Windows XP (running as a Fusion VM on my Mac). Sure enough, as fordem said, the drive itself seemed "good" but the data was not current. I ran some tests on it using the manufacturers tool and it pretty much seemed fine so I reformatted the drive.
At this point I was left with a disk marked as "1" in the DNS-323 (sw 1.05). Having read some horror stories on the net about the DNS-323 sw erroneously overwriting the data on the good disk when inserting a blank replacement, I decided to extract the data from the "good" drive and copy it onto a similarly sized external HD. To summarize I have been completely unsuccessful in doing so.
1. First I decided to attach my external HD (FreeAgent Go) to the DNS-323 and copy it locally itself - to avoid a network transfer. I read the instructions for attaching USB storage but the DNS-323 could not mount my drive.
2. Then I decided to use the SATA-to-USB bridge on this good drive and avoid network transfer by attaching both HDs (the DNS-323 one and the FreeAgent Go) to my Mac. I tried Mac OS X 10.5.6, Windows XP and Ubuntu Linux but I was unable to actually successfully see the filesystem on the good DNS-323 disk.
3. Next I decided - what the hell - lets do the network transfer and proceeded to copy over the network. I mounted Volume_1 on my Mac, mounted the FA (FreeAgent Go) and proceeded to copy files. For some strange reason the DNS-323 fails the copying at some point.
4. Thinking that this may be a Mac Finder app issue I decided to use iBackup. Same failure.
5. Then I decided - let me just use CyberDuck and ftp the files from the DNS to the FA. Same failure.
I am now stuck. I dont know what to do. I am fairly certain that any network transfer will fail - something related to either the network stack or buffers etc on the DNS-323. What I would like to do is boot Ubuntu (as a VMware Fusion VM) on my Mac, figure out what I am supposed to do to make the DNS-323 disk (attached via the SATA-to-USB adapter) accessible in Ubuntu and then copy the data over to the FA.
I would greatly appreciate it if somebody could help me understand how I could make this disk accessible in Ubuntu. I guess it really should appear as a ext3 volume after the disk has been mounted? Please advise?!!
Thanks,
m
Offline
You never mentioned which file system you used to format the FreeAgent Go?
A lot of portable hard drives are formatted FAT32 for maximum compatibility.
I seems to me the network transfer method (#3) is the simplest (although slowest)
method, what exactly what the error message for that attempt?
I can't think of many reasons that method #3 would fail. Are you sure there is
enough free space on the FA for all the DNS-323 data? Do you have any really
large files on the DNS-323, like a single file greater than 2Gb or 4Gb? These large
files might not be able to be copied from a EXT (DNS-323) file system, to a more limited
files system like FAT32.
Last edited by mig (2009-01-28 10:06:39)
Offline
if you are sure your HDD's are ok and that the degraded status is an error you can manually fix the array again
I've described it at my site how to do it:
http://www.aroundmyroom.com/2008/01/07/ … id-status/
Offline
I'd say none of the reasons that method #3 would fail are "related to either the network stack or buffers etc on the DNS-323" - but more to the actual network infrastructure, which has not been described.
I've had my DNS-323 for over two years and never once had a file transfer fail running it over a wired network - files sizes have been mixed, ranging from a few KB upto 60GB. I've seen many transfers over wireless fail, both to/from the DNS-323, and other network hosts, and all related to a specific wireless adapter (an Intel PRO/Wireless 2100a), and caused by a problem that Intel has long acknowledged.
I've also had numerous transfer failures with USB/SATA bridges and ext2/3 formatted disks when running the ext2ifs file system drivers due to "path too long" issues - which may or may not be a factor here, as we are told "For some strange reason the DNS-323 fails the copying at some point"
Offline
I have experienced this too..
check this post out, if you have not solved it already. All I did was to put the disk back to the array with a simple linux command:
http://dns323.kood.org/forum/t2595-(another)-Degraded-Array-Topic-with-mdadm%3F.html
I had two WD disks and I personally believe this dropped behaviour occured due to normal desktop drives from WD does not have an error recovery limit (TLER). The special RAID edition disks from WD has a maximum 7 seconds error recovery in RAID mode, and thus keeps the disk from being dropped from the array.
This is another topic and maybe not related to your case...
deurges
Offline
Gents,
My apologies for the long absence. I finally got around to working on this. All the things you suggested were useful.
1. the network setup (when I tried copy and ftp) was very simple: both the laptop and the DNS-323 were connected to an isolated wired router - there was nothing else on the network. That led to my suspicion about the network buffers etc. Which of course was in error.
2. the FA disk format was the issue. I reformatted the FA as a HFS+ drive (I used a Mac) and that solved all issues.
3. the degraded status was indeed due to the disk0 having been removed from the RAID config
4. I took the disk out and ran SeaTools DOS on it (its a Seagate). It revealed a broken disk - which would explain the RAID issue.
I would like to thank you for all your help - I greatly appreciate it.
Best,
m
Offline
Pages: 1