DSM-G600, DNS-3xx and NSA-220 Hack Forum

samfree · 2011-03-12 21:31:09

Can someone give me steps to repair the a RAID1 volume. HW: DNS323 - RevC1 reflashed to FW: 1.09 with 2 x WD20EADS (2TB WD Green Hard Drives-The only ones that DLINK claims to have tested in the unit).

If I can get a clear picture of this, I'll happily document it in the wiki so that everyone doesn't have to keep reinventing the wheel.

I've spent more than 2 days digging to get to this point because I'm pretty new to low level linux hacking, and here is what I have done and the results so far:

I booted up with a minimal funplug that I hacked together from code on this board that sets up telnet and the version of e2fsck from funplug v0.5 (see below), and the result was as follows:

Code:

# killall smbd   
# killall nmbd   
# # /bin/umount /dev/md0
# # e2fsck -v /dev/md0

e2fsck 1.41.2 (02-Oct-2008)
The filesystem size (according to the superblock) is 488116951 blocks
The physical size of the device is 488116928 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? yes

I did another test to see if there were any other issues:

Code:

# # e2fsck -nv /dev/md0

e2fsck 1.41.2 (02-Oct-2008)
The filesystem size (according to the superblock) is 488116951 blocks
The physical size of the device is 488116928 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no

/dev/md0 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create? no

Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/md0: ********** WARNING: Filesystem still has errors **********


  149012 inodes used (0.12%)
    4457 non-contiguous inodes (3.0%)
         # of inodes with ind/dind/tind blocks: 51059/8811/3
157183185 blocks used (32.20%)
       0 bad blocks
      24 large files

  138051 regular files
    9299 directories
       0 character device files
       0 block device files
       0 fifos
    1447 links
    1653 symbolic links (1653 fast symbolic links)
       0 sockets
--------
  150450 files

and it appears there is other file system corruption.

Additional Background and possible hit re the outstanding Internal SCAN DISK Problems
I think there may also be a clue here as to why the native "SCAN DISK" isn't working as well. (When I run the SCAN DISK, it starts, the status bar starts at 0 and then goes immediately to 100% and the results shows Volume_1 FAILED even though the systems SEEMS to be operating properly). I am running a DNS323 Rev C1 - FW1.09 with 2 WD20EADS 2TB drives in RAID0 currently with about 500GB of data on them. This is my second DNS323, the first one has been running well with 2 WD10EADS in RAID0, and my intent was to get the new unit stable, test the crap out of it, and then migrate the data). Given the issue with 2TB drives, I started with a single WD20EADS, copied some data, did a few simple checks, and then added the second WD20EADS and had the system creating a RAID1 volume by syncing with the first volume. Everything seemed to work fine. The only irregular activity in the history of the unit was I was copying data form my other DNS323 to this one, and the windows PC crashed (not related to the copy operation). I'm wondering if 'mdadm' has problems with large drives? I also found a post here [link}http://pith.org/notes/2005/07/23/how-i-fixed-my-raid-1-partition-size-error/[/link], but because of my inexperience couldn't figure out if it applied/how to translate it to the DNS323 environment.

Minimal funplug for file system checking (might save someone else some time - Working on FW1.09 - fun_plug 0.05)

Code:

#
# Minimal fun_plug, only enables telnet
#
# Requires: /mnt/HD_a2/lnx_bin/busybox3
#           /mnt/HD_a2/lnx_bin/utelnetd
#
 
#
# Start the telnet deamon
#

# improved starttelnet.sh, enabling the filesystems to be unmounted and checked, if need be. 
 
# copy the provided components to a directory on the ramdisk
# the ramdisk is regenerated with every boot, so the copy has no lasting effects at all
# simply copy the two files over
cp /mnt/HD_a2/lnx_bin/utelnetd /sbin/utelnetd
cp /mnt/HD_a2/lnx_bin/busybox3 /bin/busybox3

# Copy newer version of e2fsck
cp -f /mnt/HD_a2/ffp/sbin/e2fsck /sbin
cp -f /mnt/HD_a2/ffp/sbin/badblocks /sbin
cp -f /mnt/HD_a2/ffp/lib/libext2fs.so.2.4 /lib
rm -r /lib/libext2fs.so*      
ln -s /lib/libext2fs.so.2.4 /lib/libext2fs.so
ln -s /lib/libext2fs.so.2.4 /lib/libext2fs.so.2 
cp -f /mnt/HD_a2/ffp/lib/libcom_err.so.2.1 /lib/                            
ln -s /lib/libcom_err.so.2.1 /lib/libcom_err.so.2                            
ln -s /lib/libcom_err.so.2.1 /lib/libcom_err.so
 
# create the terminal device as usual
/bin/busybox3 mknod /dev/ptyp0 c 2 0
/bin/busybox3 chmod 0666 /dev/ptyp0
/bin/busybox3 mknod /dev/ttyp0 c 3 0
/bin/busybox3 chmod 0666 /dev/ttyp0

# make a shell link on the ramdisk
mkdir /bin/busybox3.dir/
PATH="$PATH:/bin/busybox3.dir"
 
ln -s /bin/busybox3  /bin/busybox3.dir/sh

# and start the Telnet service from the ramdisk as well
/sbin/utelnetd -l /bin/busybox3.dir/sh -d

Any easy way to mod this script to preserve all the other commands in the original firmware?

bjby · 2011-03-13 08:43:36

Have you read this?

http://www.inreto.de/dns323/fsck/

samfree · 2011-03-15 09:59:49

Hi bjby - Thanks for the reply!

Yes I read it several times, but because of missing/incorrect info, I was afraid of trashing my
disks. After spending about 3 days on this problem, I finally reached the point where I was
about to reformat the drives and reload from backup., Your reply inspired me to go for broke
since I really didn't have much to lose.

The following describes how to clean up a RAID1 file system with the error:
The filesystem size (according to the superblock) is 488116951 blocks
The physical size of the device is 488116928 blocks
Either the superblock or the partition table is likely to be corrupt!
--- and/or general problems that can be corrected with e2fsck.

I've attempted to clarify/add to the documentation for the benefit of anyone else may find
themselves in the same situation.

There were a number of problems / outstanding questions can anyone please help fill in the
missing info / suggest how to fix the problems or make suggestions on how to improve the
clarity of the procedure?

Outstanding Questions / Problems / Documentation Errors

1. If you have been using SSH to access your DNS323 (as suggested in the installation),
what changes have to be made to the procedure so that you don't end up locking yourself
out? (This was my biggest concern and why I didn't attempt sooner.)

(I'm not sure if the installation provided the necessary telnet setup, or if I installed it
when I was setting things up initially. I ended up doing a full install of all the packages
and busybox3, utelnetd, and usb-storage.ko to do some other things. This means I have
about 500MB of excess stuff I will likely never use.)

2. Does ./reload.sh establish a completely new telnet environment? If someone had set
up a password for telnet would it survive or mess things up in any way? Are there any
other potential traps to watch out for?

3. What needs to be done about the fan / temperature control (if anything)?
(On my system the fan seemed to run on slow mode. I think the temperature might
have been a bit higher than normal, but I didn't fry anything checking 2 x 2TB in RAID1
with about 600GB of data - which took about 5-6 hours to the best of my recollection.)

The original documentation cites the following:

echo 150 >/sys/class/i2c-adapter/i2c-0/0-003e/pwm1
and read current speed with:
cat /sys/class/i2c-adapter/i2c-0/0-003e/fan1_input
(100 in pwm1 = 3150 rpm, 150 in pwm1 = 4650 rpm)
You can read the current temperature with:
cat /sys/class/i2c-adapter/i2c-0/0-0048/temp1_input

on my DNS323 under fun_plug v0.5 the path /sys/class doesn't exist

4. Does the command:

mdadm -A /dev/md0 /dev/sd[ab]2

need to be modified for any reason (i.e. What about JBOD / RAID0)?

(When I first saw the command: mdadm -A /dev/md0 /dev/sd[ab]2 I wasn't
sure if [ab]2 should be entered as is, or if it was shorthand for something.
For the benefit of the uninitiated, you can cut and paste in this
command exactly as is if your system is running RAID1.

5.Can someone with 2 separate disks please confirm part works???
>
>If you have configured 'two separate disks', you can now check
>filesystems with:
>
> e2fsck /dev/sda2
> e2fsck /dev/sdb2
>

6. What happens to /lost+found ?? I pressed y to create it. I assumed that I should
find a directory full of fragments (similar to what you get out of CHKDSK in windows/dos).
After rebooting, the only /lost+found I could see was under '/' and it existed before I ran
e2fsck and appeared to be empty. Why did I get "/lost+found not found. Create<y>?"
answer Y to the question, and then not find a lost+found directory on /dev/md0?
Does /lost+found not survive reboot?

7. What changes need to be made to the script so that the DNS323 will restart?
When I ran reboot as described in the procedure the system hung. The only way I could
get control back was to pull the plug for a few seconds and then switch back on.

================== Procedure for File System Repair

Code:

[--- Tested with FW 1.09 on HW Rev C - fun_plug v0.5 on 2011/03/14 ---]

Setup fun_plug and telnet or SSH into your DNS323: 
(See: http://dns323.kood.org/dsmg600/howto:fun_plug )

---- Install
  cd /ffp                                         (or /mnt/HD_a2)
  rsync -av inreto.de::dns323/fsck .

---- Change environment 
  cd /ffp/fsck                          (or /mnt/HD_a2/fsck)
  ./reload.sh

This will stop all processes and initiate a reboot. Your SSH or telnet
session is aborted.  The web interface and SSH are now dead so 
telnet is the only way to access your DNS323.   You will also find 
that the Hard Drive lights don't come on - they are not supported 
in this maintenance environment.

You should be able establish a new telnet session after 10-20 seconds. 

Login as 'root', there's no password.

After boot, run:

    free

Look for the 'swap' line  to verify that swap has been correctly
enabled.  The 'total' column must be non-zero. This is essential, 
e2fsck will fail without sufficient  swap space.

---- Check / Repair File System

> [can someone with 2 separate disks please confirm this works???]  <<<<<<<<<<<
>
>If you have configured 'two separate disks', you can now check
>filesystems with:
>
>    e2fsck /dev/sda2
>    e2fsck /dev/sdb2
>

If you are using RAID, set it up before checking the filesystem:

    mdadm -A /dev/md0 /dev/sd[ab]2
    e2fsck /dev/md0

e2fsck may take some time, and ask for lost+found:

  /lost+found not found.  Create<y>?

Just press Enter. Answering 'n' is ok as well, but e2fsck will not repair
your file system, and you will get a message  that your
 'Filesystem still has errors' at the end.

If you get an error that reads some thing like:
The filesystem size (according to the superblock) is 488116951 blocks
The physical size of the device is 488116928 blocks
Either the superblock or the partition table is likely to be corrupt!

then you will need to use the commands:  

   e2fsck -f /dev/md0
   resize2fs /dev/md0
   e2fsck /dev/md0

the last e2fsck should produce output that looks something like:

e2fsck 1.41.0 (10-Jul-2008)
/dev/md0: clean, xxxxxx/xxxxxxxxxx files, xxxxxxxxxxx/xxxxxxxxxxxx blocks


== Rebooting ==

When done, run:

 reboot

(This causes the system to hang - the only way to get control back is disconnect the
power by pulling the plug for a few seconds and then power back on with the power 
switch on the front.)

---- ADDITIONAL INFO for ADVANCED USERS

By default, Linux 2.6.12.6 will be booted (check with 'uname -r'). 
More recent kernels are also available. To boot them, pass a
zImage file to reload.sh, e.g. 
   ./reload.sh zImage-2.6.25.1
Note that 2.6.25.1 and 2.6_orion will not work on CH3SNAS.

Manual pages for e2fsck can be found at:

  http://linux.die.net/man/8/e2fsck

For Parted documentation, see

  http://www.gnu.org/software/parted/

bjby · 2011-03-15 22:28:14

Lets start from 7.

7. What changes need to be made to the script so that the DNS323 will restart?
When I ran reboot as described in the procedure the system hung. The only way I could
get control back was to pull the plug for a few seconds and then switch back on.

A. Did you run these commands in a telnet (not ssh) connection as the instructions state?

This is what is suppose to happen.

The reload scripts boots the dns323 box into a minimal state. It doesnt have the webUI or samba-filerserver or the software to control the fan. Basically all you have is a telnet server and tools for diskmantanance.

Once the dns323 is booted into the minimal state you can reconnect to it via telnet and only telnet.

Ok so lets move over to 1.

#sh /ffp/start/telnetd.sh status

will tell you if the default ffp telnet-server is running.

#sh /ffp/start/telnetd.sh restart

To start/restart it.

Good luck.

Last edited by bjby (2011-03-15 23:08:26)

samfree · 2011-03-16 08:36:46

Hi bjby - thanks for the reply

Yes, I did run from telnet, and I know that the intent is only a minimal maintenance system--no problem there.

RE: #7 I was wondering if there was a way to make return to normal more graceful after maintenance is done?
Having to pull the plug, restart, kill the script, and reboot again is a bit messy. It would be great if there was a
way to at least power right off, or force a proper reboot so the regular fun_plug starts!

RE:#3 Original docs said there was a way to manually set the fan, I would assume there is a way to do it, just
a different way. I'm hoping somebody who knows can shed some light on it.

I'll check the other stuff in a day or two after the current copy operation is complete... loading about 1TB of data
onto the DNS323 which isn't speedy!!!

dhub · 2011-03-16 17:10:09

Making maintenance easier is the primary purpose of the funplug_manager I'm writing (I'm using to work on getting debian running reloaded), it starts up a web server on port 8000 for 90 seconds after the nas boots that allows selecting a fun plug script to run, if one isn't selected it boots the most recent one.

It also includes a rebootinto script that reboots the nas and runs a specific fun_plug script.

See the thread Grub as a funplug

samfree · 2011-03-17 05:04:13

Thanks for the reply dhub.... your package looks cool and I'll check it out when I get a chance... don't think it will solve the problem though...

I'm wondering if the problem is that the RAID1 volume isn't mounted and a soft reboot using the reboot script doesn't clean things up? Could that be it?

root@NAS3:/mnt# mount
rootfs on / type rootfs (rw)
proc on /proc type proc (rw,nodiratime)
usbfs on /proc/bus/usb type usbfs (rw)
sysfs on /sys type sysfs (rw)
tmpfs on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw)

I tried reboot -f and that hung too.... I'm thinking the easiest way to solve the problem is to power down
the DNS323. Is there a command to cause to DNS323 to power off????

Then when the power was turned back on the box would boot up into whatever fun_plug was there....
A lot easier than crawling under the desk to unplug and replug the DNS323!

samfree · 2011-03-17 05:14:13

Thanks bjby for the input on #sh /ffp/start/telnetd.sh status.
I can confirm that the maintenance environment starts the default telnet server.

For anyone following the thread ... I can confirm that the procedure did fix the disk
array. I copied another 300 or 400GB to the drive and rechecked, and everything
was still OK when I rechecked.

Only thing I got the message "Superblock last mount time is in the future. Fix<y>?"
I replied Y and e2fsck returned very quickly with /dev/md0: clean, ...
Anyone got any idea about this "future" thing??
DNS323 bug? (Daylight Savings Time Bug???)
Maintenance environment bug?

DSM-G600, DNS-3xx and NSA-220 Hack Forum

Announcement

#1 2011-03-12 21:31:09

Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

Code:

Code:

Code:

#2 2011-03-13 08:43:36

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

#3 2011-03-15 09:59:49

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

Code:

#4 2011-03-15 22:28:14

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

#5 2011-03-16 08:36:46

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

#6 2011-03-16 17:10:09

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

#7 2011-03-17 05:04:13

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

#8 2011-03-17 05:14:13

Re: Large RAID1 volume repair / e2fsck issue on DNS323 / Native SCAN DISK

Board footer