DSM-G600, DNS-3xx and NSA-220 Hack Forum

nicko · 2010-11-02 14:10:57

Got a DNS-323 V1.08 with two Samsung HD103UJs in RAID 1 - been happily running for ages. Not running hot (40C or less) and off a clean (UPS) supply.

Basic rule, if it ain't broke, don't fix it...

So, I broke the rule - as it was ages since I'd checked it (more than a year), I did a disk check on it from the utilities menu.

When it came back, it didn't show any errors (is there a log file somewhere?) but it did say that the raid set was degraded, so I did a manual rebuild - seemed to complete OK ad the "degraded" notice went away...

Now, when it reboots, Windows (XP pro & 7 64bit home professional) can see it, but maybe for only 5 minutes, after which SMB access goes away - web and FTP access is fine - I can still log in via the management interface and look at the status, which says its fine and can use WS_FTP to see all the data. No way to restore SMB access without a reboot...

I whipped out one of the drives and stuck it on a Win 7 64bit host in an external USB2/SATA caddy and used Ext2FSD 0.48 to read it and all the data seems to be there fine...

So, am now at a complete loss. Why has this otherwise happy system decided to randomly stop SMB after a few minutes when it never used to.

I do have a fun_plug, but disabling it (i.e. renaming it and rebooting) seems to have no effect at all....

Having only one drive in there now obviously changes the status to degraded, but the DNS still exhibits the same problem...

Thought: Maybe the e2fsck (or whatever is actually run) did something to a system file and broke the installation? Would a V1.08 re-install help?

Sigh...

Thanks

Nick

Last edited by nicko (2010-11-02 14:17:31)

FunFiler · 2010-11-02 14:20:41

Can you still access it via ip address after the 5 minute period? If you enable fun_plug agin, do the SMB and NMB process still run after the 5 minute period?

nicko · 2010-11-02 14:26:15

FunFiler wrote:
Can you still access it via ip address after the 5 minute period? If you enable fun_plug agin, do the SMB and NMB process still run after the 5 minute period?

Yup - the only thing that stops is Windows file access - that include UNC access, i.e. \\192.168.0.231\Volume_1 also stops - http://192.168.0.231 works fine, as does FTP access (so long as the fun_plug is enabled).

Should have added: The fun_plug I am using is dated 2008-04-13 from tp@fonz.de

Cheers

Last edited by nicko (2010-11-02 14:29:56)

bound4h · 2010-11-02 14:43:29

Might be a question for Fonz, but could it be that the version of fun_plug is outdated compared to the 1.08 FW version the box is using?

nicko · 2010-11-02 14:50:47

bound4h wrote:
Might be a question for Fonz, but could it be that the version of fun_plug is outdated compared to the 1.08 FW version the box is using?

Makes no difference if the fun_plug is enabled or not.

fonz · 2010-11-02 15:04:37

I don't think ffp interferes with the firmware samba, unless you've installed and started a samba ffp package.

nicko · 2010-11-02 15:48:05

More: Tried re-installed V1.08 - no change. However, when the SMB dropped out, the FTP server stopped responding although HTTP was working fine (I could still log in and get to the FTP server pages). The FTP server was showing as "Started", even though it wasn't responding, so I stopped & restarted it, after which it was working again. SMB still not working though (unless restart the box).

Windows 7 gives the message "Windows cannot access \\CRAYSAN2", Error Code 0x80070035 "The network path was not found" - this is odd as the DNS server and the rest of the network is fine and you can access the box fine with HTTP & FTP. Just to be sure, I did an "ipconfig /flushdns" on each host...

very confused now...

Cheers

Last edited by nicko (2010-11-02 15:50:37)

konke · 2010-11-02 16:33:55

Have the same situation, everytime i reboot the 323, the smb stop working.
I "solved" it by entering the "network access" and edit any user and click the "modify settings"
Not changing anything but after this the smb access works for me, untill i reboot the 323 again.

karlrado · 2010-11-02 16:36:42

This is all very odd. You're right in that just checking the disks should not have caused this.

Stuff to try:

Try removing your samba shares and add them back via the management interface. This might fix a problem in the config files, but I don't know if it will help or not.

Enable your fun_plug and:
- check to see if the root file system is full - df command. Mine is usually in the mid 80% range. If it is close to 100%, use find to locate big files such as runaway log files.
- try to watch the processes with top or your favorite tool right after a reboot. See if there are too many instances of smbd. There should only be one or two, plus one for any active connections, I think. The point is to look for runaway spawning of additional smbd processes.
- use top, free, or whatever to see if RAM is getting used up. top can tell you how much RAM a process is using.
- after the samba shares fail (5 mins), see if smbd and nmbd are still running.

I can't think of anything else that would cause a disconnect after 5 minutes. That amount of time is suspicious because it might take that long for a runaway log file to fill up the ram disk, for example.

Finally, dig around a bit and figure out how to turn on samba logging and set its verbosity/debug levels to high values. Try to direct the logs to your hard disk. Reboot or restart samba and see if the logs offer any clues.

jamieburchell · 2010-11-02 17:36:12

When you do a scandisk, the drives are unmounted and it's necessary to reboot the DNS-323. I'm not sure what the rebuild would have done without first rebooting after the scandisk

nicko · 2010-11-02 18:58:36

Just updated my ffp and installed smartctl. I'm using "smartctl -d marvell -t long /dev/sda" to check the disks - one of them has a bad read block but its not the disk that's still in the DNS...

80% in use on /dev/HD_a2 - what else should I look for and where?

Thanks

karlrado · 2010-11-02 20:58:35

It isn't the hard disk you have to worry about filling up for this type of problem. It is the root file system, denoted by rootfs below. It is quite small because it actually resides in RAM. If the rootfs fills up, then things can start failing.

Code:

root@Toaster:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
rootfs                9.7M  8.0M  1.3M  87% /
/dev/root             9.7M  8.0M  1.3M  87% /
/dev/loop0            5.7M  5.7M     0 100% /sys/crfs
/dev/sda2             916G  639G  278G  70% /mnt/HD_a2
/dev/sdb2             916G  507G  410G  56% /mnt/HD_b2
/dev/sda4             487M  2.3M  484M   1% /mnt/HD_a4
/dev/sdb4             487M   16K  487M   1% /mnt/HD_b4
/dev/sda2             916G  639G  278G  70% /opt
root@Toaster:~#

In this case, it is 87% full. Remember that directories like /bin, /etc, /tmp, /var and others are in the rootfs.

If rootfs is full, you can narrow down the offending files by finding the ones that are over a certain size:

Code:

root@Toaster:~# find / -xdev -size +200k
/bin/busybox
/home/root/.authenticate/ca.pem
/home/root/mbox
/image.cfs
/lib/libuClibc-0.9.28.so
root@Toaster:~#

lists all the files that are bigger than 200K. The '-xdev' keeps find from looking for files on the hard disks. The files in my list above are normal and expected on my system, but I should probably do something about the mbox file.

But if you see a large *.log file or something like that, then it may be worth looking into.

nicko · 2010-11-02 23:22:27

I found this file : /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp size 2911232

Can it go, and how to delete it?

rootfs has 0% available...

Cheers

Last edited by nicko (2010-11-02 23:24:27)

karlrado · 2010-11-02 23:32:24

/mnt/HD_a4 is on the hard disk. That's not the problem.

The 0% avail on rootfs is your problem. Can you run the find command I posted in order to list the large files in the rootfs?

find / -xdev -size +200k

I'm not sure if this works on ffp or not.

If not, use

ls -l

to look for big files in /tmp and in /var, for starts.

nicko · 2010-11-03 08:46:12

Hi,

This is the result:

Code:

/ # find / -xdev -size +100k
/bin/busybox
/image.cfs
/lib/libuClibc-0.9.28.so
/mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp
/ #

Not much that's exciting there... checked /tmp & /var with ls -alRh /xxx and nothing over 20kb anywhere...

The extended self test on the drive I removed (RH) is showing consistent read error at LBA 1056010 every time I run an extended test. Its in an external USB 2 box running GSmartControl from a Windows 7 64 bit box. Is this likely to be a real error (does being on USB matter that much)?

The drive is an HD103UJ (Spinpoint 1TB F1 7200rpm) - should I replace it with the current HD103SJ (Spinpoint 1TB F3 7200rpm) - will it matter that the other drive in the RAID 1 set is not *exactly* the same? (i.e. its an F1 not an F3 - same speed & size though).

Cheers

Last edited by nicko (2010-11-03 14:44:36)

nicko · 2010-11-04 06:55:11

Hi - Any thoughts on this?

Many thanks

karlrado · 2010-11-04 16:26:23

You said that rootfs had 0% remaining and that find command reported no *large* files.

Did you run that find command after smb stopped working?

The other possibility is you may have a huge number of small files in rootfs that should not be there, but that is less likely.

It would be good if you could post the output from the following commands

df -h
find / -xdev -size +100k
ps -eal

AFTER smb stopped working.

The first thing to figure out is why the rootfs is full, if it is full.

nicko · 2010-11-04 16:38:37

karlrado wrote:
You said that rootfs had 0% remaining and that find command reported no *large* files.

Did you run that find command after smb stopped working?

The other possibility is you may have a huge number of small files in rootfs that should not be there, but that is less likely.

It would be good if you could post the output from the following commands

df -h
find / -xdev -size +100k
ps -eal

AFTER smb stopped working.

The first thing to figure out is why the rootfs is full, if it is full.

Thanks for the reply -

Code:

/ # cd /
/ # df -h
Filesystem                Size      Used Availab
rootfs                    9.7M      9.7M
/dev/root                 9.7M      9.7M
/dev/loop0                5.6M      5.6M
/dev/md0                914.4G    775.2G    139.
/ # find / -xdev -size +100k
/bin/busybox
/image.cfs
/lib/libuClibc-0.9.28.so
/mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp
/ # ps -eal
PID   USER     COMMAND
    1 root     init
    2 root     [ksoftirqd/0]
    3 root     [events/0]
    4 root     [khelper]
    5 root     [kthread]
   11 root     [kblockd/0]
   14 root     [khubd]
   49 root     [pdflush]
   50 root     [pdflush]
   52 root     [aio/0]
   51 root     [kswapd0]
  190 root     [scsi_eh_0]
  191 root     [scsi_eh_1]
  192 root     [scsi_eh_2]
  193 root     [scsi_eh_3]
  201 root     [mtdblockd]
  215 root     [kcryptd/0]
  216 root     [kmirrord/0]
  227 root     [loop0]
 1159 root     xmldb -n config
 1202 root     [md0_raid1]
 1271 root     chkbutton
 1300 root     /web/webs
 1332 root     fancontrol 0
 1339 root     op_server 3 3 3
 1345 root     -sh
 1366 root     crond
 1444 root     atd
 1501 root     mserver
 1696 root     /ffp/sbin/telnetd -l /ffp/bin/sh
 1702 root     /ffp/bin/sh
 5870 root     /ffp/bin/sh
21533 root     /usr/sbin/samba/nmbd -D
21559 root     ps -eal
/ #

Cheers

karlrado · 2010-11-04 17:33:13

The rootfs looks full, which is bad. nmbd is part of samba and is still running, but there are no samba smbd processes running, which is bad.

Right after a reboot, the rootfs should NOT be full like that. Something is writing something to the rootfs, making it fill up.

I and another poster suggested that you modify some samba settings in the web admin interface and apply the update. It doesn't matter what the change is - you might add or change a share, etc. The idea is to force the admin interface to re-write the samba config files with the hope of fixing any damage. Then reboot and see what happens.

Other than that, I don't know what to suggest, especially since the system used to work as it is.

And having the problem persist with fun_plug disabled sort of eliminates any other system config you may have done.

If it were me, I'd reboot and then quickly open a couple of ssh windows before samba dies. Run top in one and watch to see what processes are busy doing something. Look for smbd processes. Run the df -h command in the other ssh window and see if the rootfs at least starts out at less than full. Try to figure out what is filling up the rootfs.

Next I would modify my fun_plug to stop samba. You can do this by adding

/usr/bin/smb stop

at the bottom of the fun_plug.

Reboot.

Samba will be off, but you can login with ssh and see if the rootfs is OK after the usual five minutes or whatever. If that's OK, then it must be samba.

Next I would figure out how to turn on samba logging or debugging and start samba manually:

/usr/bin/smb start

and see if I can see what is wrong. It would also be good to look at the files in /etc/samba, etc to see if they make sense.

That's all I can think of at the moment, but that's where I would start if it were me. Of course, if you've got a good backup of your data, the last resort is to reset, reformat, and rebuild.

nicko · 2010-11-04 18:52:51

Thanks for the reply,

Well.. I changed a few setting to&fro in the shares & stuff, then restarted.

Ran top in one window, df -h in the other (repeatedly).

top showed prescan always consuming 10% but interestingly 5 instances of mt-daapd, one of which was always taking between 5 & 7%

df -h started out showing 1.7M free on rootfs - this slowly crept down over about 10 minutes to 0 free... at this point the mt-daapd processes exited, followed shortly by the smbd instances (both of them).

Cheers

Last edited by nicko (2010-11-04 18:53:25)

bgravato · 2010-11-04 19:06:04

Regardless of what's causing the samba failure, I think it would be interesting to first find out what's causing the rootfs to get full...

Can you run the command

Code:

df -hsx /*

and post here the output?

For reference, this is what I get:

Code:

/ # du -hsx /*
426.0k  /bin
36.0k   /default
6.0k    /dev
305.0k  /etc
0       /ffp
3.0k    /home
5.6M    /image.cfs
562.0k  /lib
1.0k    /lost+found
3.0k    /mnt
2.6M    /proc
1.0k    /root
4.0k    /sbin
6.0k    /sys
243.0k  /tmp
103.0k  /usr
22.0k   /var
233.0k  /web
1.0k    /welcome.msg

fonz · 2010-11-04 19:39:12

And how about "ls -l /tmp"?

nicko · 2010-11-04 20:50:58

Thanks for the continuing help...

Code:

/ # ls -lR /tmp
/tmp:
-rw-r--r--    1 root     root            0 Nov  4 16:38 ClientDisplayNonUTF8
-rw-r--r--    1 root     root            5 Nov  4 16:37 CustomID
-rw-r--r--    1 root     root            0 Nov  4 18:12 ErrorDisk
-rw-r--r--    1 root     root            0 Nov  4 16:37 GetTimeServerFinish
-rw-r--r--    1 root     root           91 Nov  4 16:38 QuotaStatus
-rw-r--r--    1 root     root           55 Nov  4 16:38 apkg.xml
-rw-r--r--    1 root     root            0 Nov  4 16:37 boot_finished
-rw-r--r--    1 root     root            0 Nov  4 18:12 email_ok
-rw-r--r--    1 root     root            2 Nov  4 16:38 fan_status
-rw-r--r--    1 root     root           62 Nov  4 16:37 fchmod
-rw-r--r--    1 root     root            0 Nov  4 16:38 hd_wait_format
-rw-rw-rw-    1 root     root           26 Nov  4 16:50 ituneprogbar_result
-rw-r--r--    1 root     root            0 Nov  4 16:38 load_module_finished
-rw-r--r--    1 root     root            0 Nov  4 16:38 log.lock
-rw-r--r--    1 root     root           17 Nov  4 16:37 makaddr
-rw-r--r--    1 root     root            1 Nov  4 16:38 max_dl_num
-rw-r--r--    1 root     root            0 Nov  4 16:37 md0
-rw-r--r--    1 root     root            0 Nov  4 16:37 md0_active
-rw-r--r--    1 root     root          133 Nov  4 18:46 mdstat_file
-rw-r--r--    1 root     root            0 Nov  4 16:37 mount_normal
-rw-r--r--    1 root     root            0 Nov  4 18:12 msmtp_result.txt
-rw-r--r--    1 root     root            0 Nov  4 16:37 opserver_frodo
-rw-r--r--    1 root     root            0 Nov  4 16:38 prescan.result
-rw-r--r--    1 root     root            2 Nov  4 16:50 prescan_result
-rw-r--r--    1 root     root           26 Nov  4 16:50 prescanbar_result
-rw-r--r--    1 root     root            0 Nov  4 16:38 raid_degraded
-rw-r--r--    1 root     root          709 Nov  4 16:38 raidinfo
-rw-r--r--    1 root     root            0 Nov  4 16:37 raidup
-rw-r--r--    1 root     root            0 Nov  4 16:38 re-sch
-rwxr--r--    1 root     root          145 Nov  4 16:38 restartftp.sh
drwxr-xr-x    4 root     root         1024 Nov  4 18:46 samba
-rw-r--r--    1 root     root           10 Nov  4 16:37 scsi_mapping
-rw-r--r--    1 root     root            0 Nov  4 16:37 sda
-rw-r--r--    1 root     root            0 Nov  4 16:37 sda0
-rw-r--r--    1 root     root            0 Nov  4 16:38 system_ready
-rw-r--r--    1 root     root            3 Nov  4 18:46 temper
-rw-r--r--    1 root     root            3 Nov  4 18:46 temper_C
-rw-r--r--    1 root     root            4 Nov  4 18:46 temper_F
-rw-r--r--    1 root     root          133 Nov  4 16:38 tmp_mdstat
-rw-r--r--    1 root     root           54 Nov  4 18:12 tmp_send.mm
-rw-r--r--    1 root     root           20 Nov  4 16:37 uptimes
-rw-r--r--    1 root     root            0 Nov  4 16:38 wgetpage.txt

/tmp/samba:
-rw-------    1 root     root         8192 Nov  4 18:45 account_policy.tdb
-rw-r--r--    1 root     root        40200 Nov  4 18:45 brlock.tdb
-rw-r--r--    1 root     root          231 Nov  4 18:46 browse.dat
-rw-r--r--    1 root     root          696 Nov  4 18:45 connections.tdb
-rw-r--r--    1 root     root          696 Nov  4 18:45 gencache.tdb
-rw-------    1 root     root         8192 Nov  4 18:45 group_mapping.tdb
-rw-r--r--    1 root     root          372 Nov  4 18:45 locking.tdb
-rw-------    1 root     root          696 Nov  4 18:45 messages.tdb
-rw-------    1 root     root            0 Nov  4 18:45 ntdrivers.tdb
drwxr-xr-x    2 root     root         1024 Nov  4 18:45 perfmon
drwxr-xr-x    2 root     root         1024 Nov  4 18:45 printing
-rw-------    1 root     root         8192 Nov  4 18:45 registry.tdb
-rw-------    1 root     root         8192 Nov  4 18:45 secrets.tdb
-rw-r--r--    1 root     root          696 Nov  4 18:45 sessionid.tdb

/tmp/samba/perfmon:

/tmp/samba/printing:
-rw-------    1 root     root        16384 Nov  4 18:45 lp.tdb
-rw-------    1 root     root        24576 Nov  4 18:45 printers.tdb
/ # du -hsx /*
426.0k  /bin
36.0k   /default
6.0k    /dev
306.0k  /etc
0       /ffp
3.0k    /home
5.6M    /image.cfs
562.0k  /lib
1.0k    /lost+found
2.2M    /mnt
790.0k  /proc
1.0k    /root
4.0k    /sbin
3.0k    /sent
6.0k    /sys
171.0k  /tmp
102.0k  /usr
24.0k   /var
230.0k  /web
1.0k    /welcome.msg
/ #

I know you say its not in the rootfs file system, but notice that /mnt is showing 2.2M, and previously using find / -xdev -size +100k I found 2M of that in:

Code:

/mnt/HD_a4/.systemfile/.upnpav-db:
drwx------    2 root     root         1024 Nov  4 16:38 .
drwxr-xr-x    4 root     root         1024 Nov  4 16:38 ..
-rw-r--r--    1 root     root      2911232 Nov  4 16:50 upnpav.tmp
-rw-r--r--    1 root     root         1024 Nov  4 16:44 upnpav.tmp-journal

Last edited by nicko (2010-11-04 21:07:42)

fonz · 2010-11-04 21:20:22

nicko, your "find" output looks suspicious. why is it listing /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp?
If I didn't missing something, output of "mount" and "readlink -f /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp" and "df -h /mnt/HD_a4/.systemfile" might help.

nicko · 2010-11-04 22:04:25

Hi,

fonz wrote:
nicko, your "find" output looks suspicious. why is it listing /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp?
If I didn't missing something, output of "mount" and "readlink -f /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp" and "df -h /mnt/HD_a4/.systemfile" might help.

Please note I had to reboot the system again so its again showing 1.7M free on rootfs...

Code:

/ # mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (rw)
proc on /proc type proc (rw,nodiratime)
/dev/loop0 on /sys/crfs type squashfs (ro)
/dev/md0 on /mnt/HD_a2 type ext2 (rw)
none on /proc/bus/usb type usbfs (rw)
devpts on /dev/pts type devpts (rw)
/ # readlink -f /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp
/mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp
/ # df -h /mnt/HD_a4/.systemfile
Filesystem                Size      Used Available Use% Mounted on
rootfs                    9.7M      7.5M      1.7M  82% /
/ #

Cheers

Last edited by nicko (2010-11-04 22:05:10)

DSM-G600, DNS-3xx and NSA-220 Hack Forum

Announcement

#1 2010-11-02 14:10:57

Running out of ideas...

#2 2010-11-02 14:20:41

Re: Running out of ideas...

#3 2010-11-02 14:26:15

Re: Running out of ideas...

FunFiler wrote:

#4 2010-11-02 14:43:29

Re: Running out of ideas...

#5 2010-11-02 14:50:47

Re: Running out of ideas...

bound4h wrote:

#6 2010-11-02 15:04:37

Re: Running out of ideas...

#7 2010-11-02 15:48:05

Re: Running out of ideas...

#8 2010-11-02 16:33:55

Re: Running out of ideas...

#9 2010-11-02 16:36:42

Re: Running out of ideas...

#10 2010-11-02 17:36:12

Re: Running out of ideas...

#11 2010-11-02 18:58:36

Re: Running out of ideas...

#12 2010-11-02 20:58:35

Re: Running out of ideas...

Code:

Code:

#13 2010-11-02 23:22:27

Re: Running out of ideas...

#14 2010-11-02 23:32:24

Re: Running out of ideas...

#15 2010-11-03 08:46:12

Re: Running out of ideas...

Code:

#16 2010-11-04 06:55:11

Re: Running out of ideas...

#17 2010-11-04 16:26:23

Re: Running out of ideas...

#18 2010-11-04 16:38:37

Re: Running out of ideas...

karlrado wrote:

Code:

#19 2010-11-04 17:33:13

Re: Running out of ideas...

#20 2010-11-04 18:52:51

Re: Running out of ideas...

#21 2010-11-04 19:06:04

Re: Running out of ideas...

Code:

Code:

#22 2010-11-04 19:39:12

Re: Running out of ideas...

#23 2010-11-04 20:50:58

Re: Running out of ideas...

Code:

Code:

#24 2010-11-04 21:20:22

Re: Running out of ideas...

#25 2010-11-04 22:04:25

Re: Running out of ideas...

fonz wrote:

Code:

Board footer