DSM-G600, DNS-3xx and NSA-220 Hack Forum

deurges · 2008-07-15 23:47:58

Hi all and sorry if this is covered in a previous post. I have done a quick search in the forum, but my lack of knowledge and fear of destroying my data, I ask for your assistance.

The case is that my array has become degraded, as stated on the webgui of the DNS-323. I believe it has become so since I had to power off the unit by front button because of thunderstorm(had to be quick about it... )

The front panel does however NOT indicate that any disk is failing, both are illuminating blue. But when i read/write to the disk, only the left disk show activity. Ergo, something is awry with the right drive. I have read a similar post regarding a person having a similar problem with the left drive, and someone said he should unmount the drive and remount it so sync would start. This is probably what I want to do.
The thing is that I do not know what the "right" drive is called (sba2 or sbb2 or whatnot).

I have installed funplug 0,4 and rund mdadm with details for you to read:

mdadm --misc -D /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Fri Mar 28 20:03:58 2008
Raid Level : raid1
Array Size : 486544512 (464.01 GiB 498.22 GB)
Device Size : 486544512 (464.01 GiB 498.22 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Jul 15 21:18:20 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 24c3de31:4edac109:96ca5d9b:b16f598e
Events : 0.209688

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2

AND this is what the DNS-323 webconsole reports:

HARD DRIVE INFO :
Total Drive(s): 2
Volume Name: Volume_1
Volume Type: RAID 1
Sync Time Remaining: Degraded
Total Hard Drive Capacity: 490402 MB
Used Space: 307500 MB
Unused Space: 182901 MB

How do I get back the right hand drive in the array WITHOUT loosing data?

This is NOT the first time it has happened. Last time I removed the "right" disk, connected it to my PC and cleaned all partitions. Put it back in the DNS323 and set the array up via webconsole - success. However, I believe that was a "hard" way of doing it..

So, please. Give me a hint to what to do as I believe this can be solved with mdadm quite easily(when you have the knowledge, that is... )

I have FW 1.05 and FP 0.4

Deurges!

skydreamer · 2008-07-16 01:08:58

mdadm -add /dev/md0 /dev/sda2

bq041 · 2008-07-16 06:27:43

That will not necessarily work. It is making several assumptions that may or may not be correct. Just blindly adding a drive back without removing its superblock data can cause it to sync the wrong way (I have seen this happen). It does appear from the report that the drive has been indeed removed, so you need to first find out why. Shutting down incorrectly will not cause this. Normally a failed drive will be shown as still in the array, but faulty. This shows removed -- not typical. Anyway run mount and check to make sure the syatem did not mount that drive by itself. If it did, it will not add back until after you unmount it. Once you know that it is not mounted, then run:

Code:

mdadm --zero-superblock /dev/sda2

This will remove the array information from that drive. Since you had a problem with it, I would recommend reformatting it first, but it is up to you. To do that:

Code:

mke2fs -m 0 /dev/sda2

Then you can run the command to add it. Pleas not the correct syntax

Code:

mdadm /dev/md0 -a /dev/sda2

and immediately run

Code:

do_reboot

This will reboot it and it will sync following that. You can also choose to allow sync to complese before rebooting, but you may get an amber light which will go away after the reboot. Also, if you end up having to reboot during the sync, it will begin from scratch after the reboot.

All this also assumes you have not moved your drives around, which can cause this type of issue.

Last edited by bq041 (2008-07-16 06:29:25)

skydreamer · 2008-07-16 14:29:07

bq041 wrote:
Pleas not the correct syntax
Code:
mdadm /dev/md0 -a /dev/sda2

Both this syntax and the one in my previous reply are correct. For more information see man mdadm. I personally find the former easier to remember but it is a matter of preference.

Why would you reboot after hot adding the disk, it will interrupt the sync process?

Point well made on possibility of the disks being physically swapped in the trays or indeed one of them removed.

I have yet to come accross a case where existing superblock on a hot added disk would cause corruption of the already running array. The only thing that matters is the partition size? Do you have some evidence that would suggest otherwise?

bq041 · 2008-07-16 17:42:03

1) The reason I prefer the later syntax is because you can combine command very easily, such as: mdadm /dev/md0 -f /dev/sda2 -r /dev/sda2 As to the point of the syntax being correct, the manual I learned from (albeit is old) only covers the latter syntax, not the former.
2) The reason is that the DNS does not like it when you add the first scsi device to the array, since that is by default its primary disk. Two things can potentially happen, you will get the amber light on one or both of the drives, and depending on how the unit originally booted up, the web admin will not like the setup and not allow you to do anything. By rebooting immediately, the DNS will boot up in the proper sequence (as far as the stored config files are concerned) and begin the sync process whereby the DNS is happy during the whole thing.
3) N/A
4) This is most likely related only to the DNS. It not only uses the superblocks of each drive, but also uses files called raidtab and raidtab2web to let the system know how to configure the raid, and which disk is the primary source of data. If there is no superblock on the disk being added, it will automatically be added as a spare, but I have had happen 1 time while testing on the DNS, that I left the superblock intact before I added it where synced the wrong way. It may have been a fluke, but why take the chance? The only explaination I can give is that the information stored in both superblocks were consistant with each other, having come from the same array originally. The DNS sets sda2 as the primary source of data automatically, and when it was added, the array "thought" it was the data source and synced the other one to it. It could be due to the version or something broken in the version of mdadm in the DNS, I do not know. As I said, it has only happened to me personally once.

4b) The other reason I zero out the superblock before anything else is, if you get a corrupted config file (raidtab or raidtab2web), the device can sporadically create the array or not create the array when it is booted up. I have had multiple sucessive boots where the array formed on one and not on the other. The bad part is since the files typically set the first scsi device to primary, if, for example, the case above is rebooted, there is the possibility of the array reforming and syncing to the first drive, which is his offline drive. That's okay, as long as he has not added or changed data. Albeit, in this situatioon, barring any reboots before attempting to add the drive back, it will probably work with no problem. I just don't like to take the chance.

deurges · 2008-07-17 00:23:45

Hi again!

First, I am forever grateful for your quick reply and serious responses.

Skydreamer was spot on, mdadm -add /dev/md0 /dev/sda2 did the trick and added the disc back into the array. Sync startet immediately. I did not format or remove superblock, but I will keep this in mind for next time.(i guess wrong syncing would not matter, since data should be identical in my case)

A collegaue of mine with general linux knowledge helped me to discover why the disc was not part of the array anymore.. I failed to copy the log file, but it said something that sda2 was non fresh and kicking it out with a following demount command. ( i believe it was the command: cat /proc/diskstats that I used to see this..)
This made him suggest the same as skydreamer did.

A follow up question is; would the webgui have done the same if I had choosen to set raid level again, or would it delete everything?

Deurges!

deurges · 2008-07-17 00:26:42

...and I must add: The only thing i did was to shut the box down from idle state with the power button, nothing else. I must admit, it was slow to react. It would not power off at first, so I had to keep pressing the front button until it eventually shutted down... damn "thunder and lightning"...

-D-

I add som follow up info:

/ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1]
md0 : active raid1 sda2[0] sdb2[1]
486544512 blocks [2/2] [UU]

unused devices: <none>

and

/ # dmesg|grep sd
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda4
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
SCSI device sdb: drive cache: write back
sdb: sdb1 sdb2 sdb4
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
Adding 530104k swap on /dev/sda1. Priority:-1 extents:1
Adding 530104k swap on /dev/sdb1. Priority:-2 extents:1
ext3: No journal on filesystem on sda4
ext3: No journal on filesystem on sdb4
ext3: No journal on filesystem on sda2
ext3: No journal on filesystem on sdb2
md: bind<sda2>
md: bind<sdb2>
md: kicking non-fresh sda2 from array!
md: unbind<sda2>
md: export_rdev(sda2)
ext3: No journal on filesystem on sda4
ext3: No journal on filesystem on sdb4
md: bind<sda2>
disk 0, wo:1, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
disk 0, wo:0, o:1, dev:sda2
disk 1, wo:0, o:1, dev:sdb2

It still say: md: kicking non-fresh sda2 from array!, but then again: md: bind<sda2>... hmmm..

-D-

Last edited by deurges (2008-07-17 00:42:46)

skydreamer · 2008-07-17 00:28:40

deurges wrote:
A follow up question is; would the webgui have done the same if I had choosen to set raid level again, or would it delete everything?

I generally avoid RAID operations over GUI on DNS-323, it is too unpredictable. Command line is your friend :-)

bq041 · 2008-07-17 04:00:39

From what I am reading about mdadm today, the non fresh sda might be refering to an out of date superblock. If you look at the mdadm manual under the assemble section (used to assemble an already created array, such as at boot up), it briefly mentions that you have to force the array to start minus a disk if one of the superblocks is out of date. Here is the excerpt.

For assemble:

-u, --uuid=
uuid of array to assemble. Devices which don't have this uuid are excluded

-m, --super-minor=
Minor number of device that array was created for. Devices which don't have this minor number are excluded. If you create an array as /dev/md1, then all superblocks will contain the minor number 1, even if the array is later assembled as /dev/md2.
Giving the literal word "dev" for --super-minor will cause mdadm to use the minor number of the md device that is being assembled. e.g. when assembling /dev/md0, will look for super blocks with a minor number of 0.

-f, --force
Assemble the array even if some superblocks appear out-of-date

-R, --run
Attempt to start the array even if fewer drives were given than are needed for a full array. Normally if not all drives are found and --scan is not used, then the array will be assembled but not started. With --run an attempt will be made to start it anyway.

-a, --auto{=no,yes,md,mdp,part}
See this option under Create and Build options.

-U, --update=
Update the superblock on each device while assembling the array. The argument given to this flag can be one of sparc2.2, summaries, or super-minor.
The sparc2.2 option will adjust the superblock of an array what was created on a Sparc machine running a patched 2.2 Linux kernel. This kernel got the alignment of part of the superblock wrong. You can use the --examine --sparc2.2 option to mdadm to see what effect this would have.

The super-minor option will update the prefered minor field on each superblock to match the minor number of the array being assembled. This is not needed on 2.6 and later kernels as they make this adjustment automatically.

The summaries option will correct the summaries in the superblock. That is the counts of total, working, active, failed, and spare devices.

This may be what is going on, and if it is, you would have wanted to zero the superblock before adding it back to the array.

deurges · 2008-07-17 10:04:50

Hi again!

ok, bq041. I understand that indeed your first suggestion with clearing the superblock etc is the best way. I guess I can do what you suggested previous in the post now? that is, removed sda2 and clear superblock etc?

-D-

skydreamer · 2008-07-17 11:00:55

bq041 wrote:
From what I am reading about mdadm today, the non fresh sda might be refering to an out of date superblock. If you look at the mdadm manual under the assemble section (used to assemble an already created array, such as at boot up), it briefly mentions that you have to force the array to start minus a disk if one of the superblocks is out of date. .....
This may be what is going on, and if it is, you would have wanted to zero the superblock before adding it back to the array.

Deurges is not assembling (mdadm --assemble) an array but hot adding a disk (mdadm --add) and this is a very different scenario.

In my experience with over a dozen of mdadm RAID arrays hot adding always works unless the added partition is smaller than the md device itself.

And once a disk is hot added all superblocks are in sync.

Last edited by skydreamer (2008-07-17 11:03:09)

bq041 · 2008-07-17 14:53:17

I was refering to the DNS starting up, not his hot add. Hence this line:

bq041 wrote:
...such as at boot up.

I am saying that you are wrong in reference to RAID arrays and mdadm. I have also worked with many of them as far back as 1998. What I am saying is that in my expirience with it, the DNS does not always act, or react, as somebody would expect a regular linix / unix machine to react. D-Link has added some very interesting checks and configs to this unit that directly effect the raid arrays. I found they do not always react the way I would expect them to on a full blown linux machine. That's all I'm saying.

Last edited by bq041 (2008-07-17 15:01:30)

skydreamer · 2008-07-17 20:05:34

bq041 wrote:
I was refering to the DNS starting up, not his hot add. .... I found they do not always react the way I would expect them to on a full blown linux machine. .....

No worries, I think that poor deurges is probably getting confused with all our academic arguments.

I will hijack this threat a little if you do not mind since you might be able to shed some light on one peculiarity of the D-link's mdadm. When I create a new mdadm array on a linux machine and insert it into D-link it may take up to 10 minutes for the DNS-323 to boot up the very first time. I suppose it goes through its RAID1 motions but would you have an idea what exactly is it doing? The RAID1 is then recognized, as being in sync, all data is untouched and subsequent boots take the usual amount of time.

deurges · 2008-07-17 22:33:47

I guess I was confused to begin with, so state unchanged..

Even if you two disagree about certain aspects, you both have helped me a great deal. For a fact, I now know that this DNS-323 is not set up as a normal linux box. To be honest, I am glad I asked what to do before I did it.

As for my part, everything is working and I think I will let this part be?? :
md: bind<sda2>
md: bind<sdb2>
md: kicking non-fresh sda2 from array!
md: unbind<sda2>
md: export_rdev(sda2)
ext3: No journal on filesystem on sda4
ext3: No journal on filesystem on sdb4
md: bind<sda2>

OR should I mark sda2 as failed, remove it from the array, clear superblock + format, and add it back?

deurges!

bq041 · 2008-07-17 23:09:23

Skydreamer,
This is because there are 3 files located in 4 locations that must be set-up DNS specific. If you are using f/w 1.04 or 1.05, then you also need 2 more partitions. This is so the DNS can determine if it needs to format or not, and also if the drive are corrctly inserted.

Files: hd_magic_num
raidtab
raidtab2web

Locations: /dev/mtdblock0/ (flash) -- Mount this to /sys/mtd1 with minix
/dev/mtdblock1/ (flash) -- Mount this to /sys/mtd2 with minix
/dev/sda4/.systemfile/ (f/w 1.04 and up)
/dev/sdb4/.systemfile/ (f/w 1.04 and up)
/dev/sda2/.systemfile/ (f/w 1.03)
/dev/sdb2/.systemfile/ (f/w 1.03)

The raidtab and raidtab2web files must be respectively identical in all locations and updated accordingly for the particular setup. (The 2 files are different, but all locations are the same.) These files set the environment by telling the DNS how the drives are configured and the particulars about the array. The modes are linear, raid1, raid0, and normal. raidtab also sets up the raid masterdisk. If you are interested I have written a few scripts for setting up and breaking raid1 arrays on the DNS. They are on the forum. I will eventually make them up with full functionality and no rebooting, but that will take a lot of time.

I also made one to convert a f/w 1.03 raid1 array to 1/04. It is more advanced that the make or break it ones because I added some safeguards to it. In that particular script file, I have subroutines that create the 3 files for you automatically. You may want to look at that part. The link is here: http://dns323.kood.org/forum/t2444-Wiza … -1.05.html post #15.

The MOST important thing to remember is to UNMOUNT THE FLASH! when you are finished. Corrupting this filesystem is a very bad thing.

hd_magic_num is special. The instances on the flash are identical and follow the this format for a raid or multidrive setup:

<hard drive 0 magic number>
<hard drive 1 magic number>
<hard drive 0 serial number>
<hard drive 1 serial number>

For each of the drive locations, hd_magic_num follows this format:

<hard drive 0 magic number>
<hard drive 1 magic number>

I do not have the algorithm for the magic numbers, but I found if I generate random numbers when creating the array, it works with no problem. If I remember they are between 1 and 5 digits. Just make sure to update all 4 locations at the same time and the numbers match the respective drives.

Ask if you have any more questions.

Last edited by bq041 (2008-07-17 23:15:45)

bq041 · 2008-07-17 23:28:43

deurges,
I would prabobly do the latter, but that is me. I have not actually ever gotten this error, just read about it and have heard about it from people.

skydreamer · 2008-07-18 02:34:07

bq041 wrote:
Skydreamer,
....
Ask if you have any more questions.

Cool, I actually like the DNS-323 little less after reading about the peculiar workings of its RAID subsystem.

Yep, I would have a million $ question: Why does the formatting stop at 94% when creating RAID1 on some large disks and the partition table does not utilize the entire disk? I suppose that it is a bit remote to the original topic but this is how it all started....

bq041 · 2008-07-18 06:48:13

That is a tough question. Actually, the web gui does not always format a drive. If you know linux, you know that, unlike dos based systems, partitioning the drive does not actually damage any data. An example: Take a drive with 500 blocks and 2 partitions. The first uses blocks 1-66 and the second 66-500. If you delete the partitions and create new ones, your data is not gone, it is just not accessable. If you delete the new ones and repartition back to the 2 partions of 1-66 and 67-500, you can mount the the partions and all your data is there. Up to the point you format the new partitions, your data is still on the physical disk.

Now, you may ask why I told you this. I found out while doing raid experiments that after "formatting" with the web gui to exactly the same size I had before, one of my disks had all its data on it. When that happened, I tried it again, this time while in a telnet session to see what process were running. It fdisked the disk, but it never ran mke2fs. I've had other times that it did nothing. I think the system was designed to set-up new disks, and if it already knows a disk and it has a valid ext2 filesystem, it gets confused. On the flip side, I have had it work sometimes with no problem. One thing I have found consistantly is that if you try to "format" disks that you have already had in the unit, it seems to do the 94% thing more often than not. Some people try deleting the partitions, and that is ify. One thing I have done that has worked every time is to blank out the hd_magic_num file in the 2 instances of flash and erased the .systemfiles directories on the hard drives. The unit seems to hang up the most when it has already had serial numbers set in flash, especially if either of them matches the drive you are attempting to format. I think it is hanging up very early on during some kind of check, before it actually does anything. Some people claim that switching f/w will make it work. I can see this, because when the firmware is changed, the flash is set to default and does not have the serial numbers stored anymore.

I also noticed that the % bar has nothing to do with the actual time left in the format. It counts along at its pace even if the drive is doing nothing. I think it may just use the size you tell it the drives are to figure the time it is supposed to take. Since they are binaries that take care of that part, I am only theorizing. I also think the 94% hang up is a glitch in tht binary. This goes back to thinking that it is hanging up early during a check of the config files.

I choose to just do all the disk work myself from the command prompt, that way I know exactly what is going on, and if it gets screwed up, I can only blame myself.

As for not all of the drive being utilized, you need to elaborate a little more. I have several different size drives I use in mine and experiment with, and don't seem to have a utilization issue. One thing that mke2fs does is to reserve 5% of the drive space for the super user, unless otherwise specified. I have not checked if the gui does this, or not. (I stopped using the gui format a long time ago.) I set mine to 0% reserved when I format, but it can also be changed later using tune2fs. Actually, I just check a disk I formatted with f/w 1.03 and it is set to 0%. I do not know about 1.04 or 1.05.

Also, the swap space uses the first 66 blocks of the drive, and the config partition uses the next 69. These will vary in size, depending on the architecture of your specific drive, but the number of block remains the same.

If you mean that there is unpartitioned space left, I have not experineced that, so I cannot comment. I have 400GB, 500GB, 750GB, and 1TB drive I have used in my 2 DNS's and each one has always used all of the space. The only thing I can think of is if you create a raid array with 2 disks that are not exactly the same size (this can even happen with 2 identical model drives, ask fordem about it). Other than that, I'm stumped.

Last edited by bq041 (2008-07-18 06:58:37)

skydreamer · 2008-07-18 10:32:42

bq041 wrote:
....

As for not all of the drive being utilized, you need to elaborate a little more. ....

Yep, I am quite familiar with linux handling of disks and fdisk and can confirm your findings. I have successfully grown partitions, physical and logical volumes, RAID arrays and ext3 FS from the command line although not in DNS-323. You can also make a partition bigger without loosing any data (i.e. increasing the value of the last cylinder) providing that either you have some free space after it or do not care about the adjacent partition.

As for the partitioning gap on the new disks see my previous threat, at the beginning of it there is a little write up on what happened after inserting brand new Seagate disks.

http://dns323.kood.org/forum/t2577-DNS- … linux.html

Thanks for your time on this bq041, you have doen some really thorough investigation on this topic, fair play to you!

DSM-G600, DNS-3xx and NSA-220 Hack Forum

Announcement

#1 2008-07-15 23:47:58

(another) Degraded Array Topic - How to fix with mdadm?

#2 2008-07-16 01:08:58

Re: (another) Degraded Array Topic - How to fix with mdadm?

#3 2008-07-16 06:27:43

Re: (another) Degraded Array Topic - How to fix with mdadm?

Code:

Code:

Code:

Code:

#4 2008-07-16 14:29:07

Re: (another) Degraded Array Topic - How to fix with mdadm?

bq041 wrote:

Code:

#5 2008-07-16 17:42:03

Re: (another) Degraded Array Topic - How to fix with mdadm?

#6 2008-07-17 00:23:45

Re: (another) Degraded Array Topic - How to fix with mdadm?

#7 2008-07-17 00:26:42

Re: (another) Degraded Array Topic - How to fix with mdadm?

#8 2008-07-17 00:28:40

Re: (another) Degraded Array Topic - How to fix with mdadm?

deurges wrote:

#9 2008-07-17 04:00:39

Re: (another) Degraded Array Topic - How to fix with mdadm?

#10 2008-07-17 10:04:50

Re: (another) Degraded Array Topic - How to fix with mdadm?

#11 2008-07-17 11:00:55

Re: (another) Degraded Array Topic - How to fix with mdadm?

bq041 wrote:

#12 2008-07-17 14:53:17

Re: (another) Degraded Array Topic - How to fix with mdadm?

bq041 wrote:

#13 2008-07-17 20:05:34

Re: (another) Degraded Array Topic - How to fix with mdadm?

bq041 wrote:

#14 2008-07-17 22:33:47

Re: (another) Degraded Array Topic - How to fix with mdadm?

#15 2008-07-17 23:09:23

Re: (another) Degraded Array Topic - How to fix with mdadm?

#16 2008-07-17 23:28:43

Re: (another) Degraded Array Topic - How to fix with mdadm?

#17 2008-07-18 02:34:07

Re: (another) Degraded Array Topic - How to fix with mdadm?

bq041 wrote:

#18 2008-07-18 06:48:13

Re: (another) Degraded Array Topic - How to fix with mdadm?

#19 2008-07-18 10:32:42

Re: (another) Degraded Array Topic - How to fix with mdadm?

bq041 wrote:

Board footer