Unfortunately no one can be told what fun_plug is - you have to see it for yourself.
You are not logged in.
Got a DNS-323 V1.08 with two Samsung HD103UJs in RAID 1 - been happily running for ages. Not running hot (40C or less) and off a clean (UPS) supply.
Basic rule, if it ain't broke, don't fix it...
So, I broke the rule - as it was ages since I'd checked it (more than a year), I did a disk check on it from the utilities menu.
When it came back, it didn't show any errors (is there a log file somewhere?) but it did say that the raid set was degraded, so I did a manual rebuild - seemed to complete OK ad the "degraded" notice went away...
Now, when it reboots, Windows (XP pro & 7 64bit home professional) can see it, but maybe for only 5 minutes, after which SMB access goes away - web and FTP access is fine - I can still log in via the management interface and look at the status, which says its fine and can use WS_FTP to see all the data. No way to restore SMB access without a reboot...
I whipped out one of the drives and stuck it on a Win 7 64bit host in an external USB2/SATA caddy and used Ext2FSD 0.48 to read it and all the data seems to be there fine...
So, am now at a complete loss. Why has this otherwise happy system decided to randomly stop SMB after a few minutes when it never used to.
I do have a fun_plug, but disabling it (i.e. renaming it and rebooting) seems to have no effect at all....
Having only one drive in there now obviously changes the status to degraded, but the DNS still exhibits the same problem...
Thought: Maybe the e2fsck (or whatever is actually run) did something to a system file and broke the installation? Would a V1.08 re-install help?
Sigh...
Thanks
Nick
Last edited by nicko (2010-11-02 14:17:31)
Offline
Can you still access it via ip address after the 5 minute period? If you enable fun_plug agin, do the SMB and NMB process still run after the 5 minute period?
Offline
FunFiler wrote:
Can you still access it via ip address after the 5 minute period? If you enable fun_plug agin, do the SMB and NMB process still run after the 5 minute period?
Yup - the only thing that stops is Windows file access - that include UNC access, i.e. \\192.168.0.231\Volume_1 also stops - http://192.168.0.231 works fine, as does FTP access (so long as the fun_plug is enabled).
Should have added: The fun_plug I am using is dated 2008-04-13 from tp@fonz.de
Cheers
Last edited by nicko (2010-11-02 14:29:56)
Offline
Might be a question for Fonz, but could it be that the version of fun_plug is outdated compared to the 1.08 FW version the box is using?
Offline
bound4h wrote:
Might be a question for Fonz, but could it be that the version of fun_plug is outdated compared to the 1.08 FW version the box is using?
Makes no difference if the fun_plug is enabled or not.
Offline
More: Tried re-installed V1.08 - no change. However, when the SMB dropped out, the FTP server stopped responding although HTTP was working fine (I could still log in and get to the FTP server pages). The FTP server was showing as "Started", even though it wasn't responding, so I stopped & restarted it, after which it was working again. SMB still not working though (unless restart the box).
Windows 7 gives the message "Windows cannot access \\CRAYSAN2", Error Code 0x80070035 "The network path was not found" - this is odd as the DNS server and the rest of the network is fine and you can access the box fine with HTTP & FTP. Just to be sure, I did an "ipconfig /flushdns" on each host...
very confused now...
Cheers
Last edited by nicko (2010-11-02 15:50:37)
Offline
Have the same situation, everytime i reboot the 323, the smb stop working.
I "solved" it by entering the "network access" and edit any user and click the "modify settings"
Not changing anything but after this the smb access works for me, untill i reboot the 323 again.
Offline
This is all very odd. You're right in that just checking the disks should not have caused this.
Stuff to try:
Try removing your samba shares and add them back via the management interface. This might fix a problem in the config files, but I don't know if it will help or not.
Enable your fun_plug and:
- check to see if the root file system is full - df command. Mine is usually in the mid 80% range. If it is close to 100%, use find to locate big files such as runaway log files.
- try to watch the processes with top or your favorite tool right after a reboot. See if there are too many instances of smbd. There should only be one or two, plus one for any active connections, I think. The point is to look for runaway spawning of additional smbd processes.
- use top, free, or whatever to see if RAM is getting used up. top can tell you how much RAM a process is using.
- after the samba shares fail (5 mins), see if smbd and nmbd are still running.
I can't think of anything else that would cause a disconnect after 5 minutes. That amount of time is suspicious because it might take that long for a runaway log file to fill up the ram disk, for example.
Finally, dig around a bit and figure out how to turn on samba logging and set its verbosity/debug levels to high values. Try to direct the logs to your hard disk. Reboot or restart samba and see if the logs offer any clues.
Offline
When you do a scandisk, the drives are unmounted and it's necessary to reboot the DNS-323. I'm not sure what the rebuild would have done without first rebooting after the scandisk
Offline
Just updated my ffp and installed smartctl. I'm using "smartctl -d marvell -t long /dev/sda" to check the disks - one of them has a bad read block but its not the disk that's still in the DNS...
80% in use on /dev/HD_a2 - what else should I look for and where?
Thanks
Offline
It isn't the hard disk you have to worry about filling up for this type of problem. It is the root file system, denoted by rootfs below. It is quite small because it actually resides in RAM. If the rootfs fills up, then things can start failing.
root@Toaster:~# df -h Filesystem Size Used Avail Use% Mounted on rootfs 9.7M 8.0M 1.3M 87% / /dev/root 9.7M 8.0M 1.3M 87% / /dev/loop0 5.7M 5.7M 0 100% /sys/crfs /dev/sda2 916G 639G 278G 70% /mnt/HD_a2 /dev/sdb2 916G 507G 410G 56% /mnt/HD_b2 /dev/sda4 487M 2.3M 484M 1% /mnt/HD_a4 /dev/sdb4 487M 16K 487M 1% /mnt/HD_b4 /dev/sda2 916G 639G 278G 70% /opt root@Toaster:~#
In this case, it is 87% full. Remember that directories like /bin, /etc, /tmp, /var and others are in the rootfs.
If rootfs is full, you can narrow down the offending files by finding the ones that are over a certain size:
root@Toaster:~# find / -xdev -size +200k /bin/busybox /home/root/.authenticate/ca.pem /home/root/mbox /image.cfs /lib/libuClibc-0.9.28.so root@Toaster:~#
lists all the files that are bigger than 200K. The '-xdev' keeps find from looking for files on the hard disks. The files in my list above are normal and expected on my system, but I should probably do something about the mbox file.
But if you see a large *.log file or something like that, then it may be worth looking into.
Offline
I found this file : /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp size 2911232
Can it go, and how to delete it?
rootfs has 0% available...
Cheers
Last edited by nicko (2010-11-02 23:24:27)
Offline
/mnt/HD_a4 is on the hard disk. That's not the problem.
The 0% avail on rootfs is your problem. Can you run the find command I posted in order to list the large files in the rootfs?
find / -xdev -size +200k
I'm not sure if this works on ffp or not.
If not, use
ls -l
to look for big files in /tmp and in /var, for starts.
Offline
Hi,
This is the result:
/ # find / -xdev -size +100k /bin/busybox /image.cfs /lib/libuClibc-0.9.28.so /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp / #
Not much that's exciting there... checked /tmp & /var with ls -alRh /xxx and nothing over 20kb anywhere...
The extended self test on the drive I removed (RH) is showing consistent read error at LBA 1056010 every time I run an extended test. Its in an external USB 2 box running GSmartControl from a Windows 7 64 bit box. Is this likely to be a real error (does being on USB matter that much)?
The drive is an HD103UJ (Spinpoint 1TB F1 7200rpm) - should I replace it with the current HD103SJ (Spinpoint 1TB F3 7200rpm) - will it matter that the other drive in the RAID 1 set is not *exactly* the same? (i.e. its an F1 not an F3 - same speed & size though).
Cheers
Last edited by nicko (2010-11-03 14:44:36)
Offline
Hi - Any thoughts on this?
Many thanks
Offline
You said that rootfs had 0% remaining and that find command reported no *large* files.
Did you run that find command after smb stopped working?
The other possibility is you may have a huge number of small files in rootfs that should not be there, but that is less likely.
It would be good if you could post the output from the following commands
df -h
find / -xdev -size +100k
ps -eal
AFTER smb stopped working.
The first thing to figure out is why the rootfs is full, if it is full.
Offline
karlrado wrote:
You said that rootfs had 0% remaining and that find command reported no *large* files.
Did you run that find command after smb stopped working?
The other possibility is you may have a huge number of small files in rootfs that should not be there, but that is less likely.
It would be good if you could post the output from the following commands
df -h
find / -xdev -size +100k
ps -eal
AFTER smb stopped working.
The first thing to figure out is why the rootfs is full, if it is full.
Thanks for the reply -
/ # cd / / # df -h Filesystem Size Used Availab rootfs 9.7M 9.7M /dev/root 9.7M 9.7M /dev/loop0 5.6M 5.6M /dev/md0 914.4G 775.2G 139. / # find / -xdev -size +100k /bin/busybox /image.cfs /lib/libuClibc-0.9.28.so /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp / # ps -eal PID USER COMMAND 1 root init 2 root [ksoftirqd/0] 3 root [events/0] 4 root [khelper] 5 root [kthread] 11 root [kblockd/0] 14 root [khubd] 49 root [pdflush] 50 root [pdflush] 52 root [aio/0] 51 root [kswapd0] 190 root [scsi_eh_0] 191 root [scsi_eh_1] 192 root [scsi_eh_2] 193 root [scsi_eh_3] 201 root [mtdblockd] 215 root [kcryptd/0] 216 root [kmirrord/0] 227 root [loop0] 1159 root xmldb -n config 1202 root [md0_raid1] 1271 root chkbutton 1300 root /web/webs 1332 root fancontrol 0 1339 root op_server 3 3 3 1345 root -sh 1366 root crond 1444 root atd 1501 root mserver 1696 root /ffp/sbin/telnetd -l /ffp/bin/sh 1702 root /ffp/bin/sh 5870 root /ffp/bin/sh 21533 root /usr/sbin/samba/nmbd -D 21559 root ps -eal / #
Cheers
Offline
The rootfs looks full, which is bad. nmbd is part of samba and is still running, but there are no samba smbd processes running, which is bad.
Right after a reboot, the rootfs should NOT be full like that. Something is writing something to the rootfs, making it fill up.
I and another poster suggested that you modify some samba settings in the web admin interface and apply the update. It doesn't matter what the change is - you might add or change a share, etc. The idea is to force the admin interface to re-write the samba config files with the hope of fixing any damage. Then reboot and see what happens.
Other than that, I don't know what to suggest, especially since the system used to work as it is.
And having the problem persist with fun_plug disabled sort of eliminates any other system config you may have done.
If it were me, I'd reboot and then quickly open a couple of ssh windows before samba dies. Run top in one and watch to see what processes are busy doing something. Look for smbd processes. Run the df -h command in the other ssh window and see if the rootfs at least starts out at less than full. Try to figure out what is filling up the rootfs.
Next I would modify my fun_plug to stop samba. You can do this by adding
/usr/bin/smb stop
at the bottom of the fun_plug.
Reboot.
Samba will be off, but you can login with ssh and see if the rootfs is OK after the usual five minutes or whatever. If that's OK, then it must be samba.
Next I would figure out how to turn on samba logging or debugging and start samba manually:
/usr/bin/smb start
and see if I can see what is wrong. It would also be good to look at the files in /etc/samba, etc to see if they make sense.
That's all I can think of at the moment, but that's where I would start if it were me. Of course, if you've got a good backup of your data, the last resort is to reset, reformat, and rebuild.
Offline
Thanks for the reply,
Well.. I changed a few setting to&fro in the shares & stuff, then restarted.
Ran top in one window, df -h in the other (repeatedly).
top showed prescan always consuming 10% but interestingly 5 instances of mt-daapd, one of which was always taking between 5 & 7%
df -h started out showing 1.7M free on rootfs - this slowly crept down over about 10 minutes to 0 free... at this point the mt-daapd processes exited, followed shortly by the smbd instances (both of them).
Cheers
Last edited by nicko (2010-11-04 18:53:25)
Offline
Regardless of what's causing the samba failure, I think it would be interesting to first find out what's causing the rootfs to get full...
Can you run the command
df -hsx /*
and post here the output?
For reference, this is what I get:
/ # du -hsx /* 426.0k /bin 36.0k /default 6.0k /dev 305.0k /etc 0 /ffp 3.0k /home 5.6M /image.cfs 562.0k /lib 1.0k /lost+found 3.0k /mnt 2.6M /proc 1.0k /root 4.0k /sbin 6.0k /sys 243.0k /tmp 103.0k /usr 22.0k /var 233.0k /web 1.0k /welcome.msg
Offline
Thanks for the continuing help...
/ # ls -lR /tmp /tmp: -rw-r--r-- 1 root root 0 Nov 4 16:38 ClientDisplayNonUTF8 -rw-r--r-- 1 root root 5 Nov 4 16:37 CustomID -rw-r--r-- 1 root root 0 Nov 4 18:12 ErrorDisk -rw-r--r-- 1 root root 0 Nov 4 16:37 GetTimeServerFinish -rw-r--r-- 1 root root 91 Nov 4 16:38 QuotaStatus -rw-r--r-- 1 root root 55 Nov 4 16:38 apkg.xml -rw-r--r-- 1 root root 0 Nov 4 16:37 boot_finished -rw-r--r-- 1 root root 0 Nov 4 18:12 email_ok -rw-r--r-- 1 root root 2 Nov 4 16:38 fan_status -rw-r--r-- 1 root root 62 Nov 4 16:37 fchmod -rw-r--r-- 1 root root 0 Nov 4 16:38 hd_wait_format -rw-rw-rw- 1 root root 26 Nov 4 16:50 ituneprogbar_result -rw-r--r-- 1 root root 0 Nov 4 16:38 load_module_finished -rw-r--r-- 1 root root 0 Nov 4 16:38 log.lock -rw-r--r-- 1 root root 17 Nov 4 16:37 makaddr -rw-r--r-- 1 root root 1 Nov 4 16:38 max_dl_num -rw-r--r-- 1 root root 0 Nov 4 16:37 md0 -rw-r--r-- 1 root root 0 Nov 4 16:37 md0_active -rw-r--r-- 1 root root 133 Nov 4 18:46 mdstat_file -rw-r--r-- 1 root root 0 Nov 4 16:37 mount_normal -rw-r--r-- 1 root root 0 Nov 4 18:12 msmtp_result.txt -rw-r--r-- 1 root root 0 Nov 4 16:37 opserver_frodo -rw-r--r-- 1 root root 0 Nov 4 16:38 prescan.result -rw-r--r-- 1 root root 2 Nov 4 16:50 prescan_result -rw-r--r-- 1 root root 26 Nov 4 16:50 prescanbar_result -rw-r--r-- 1 root root 0 Nov 4 16:38 raid_degraded -rw-r--r-- 1 root root 709 Nov 4 16:38 raidinfo -rw-r--r-- 1 root root 0 Nov 4 16:37 raidup -rw-r--r-- 1 root root 0 Nov 4 16:38 re-sch -rwxr--r-- 1 root root 145 Nov 4 16:38 restartftp.sh drwxr-xr-x 4 root root 1024 Nov 4 18:46 samba -rw-r--r-- 1 root root 10 Nov 4 16:37 scsi_mapping -rw-r--r-- 1 root root 0 Nov 4 16:37 sda -rw-r--r-- 1 root root 0 Nov 4 16:37 sda0 -rw-r--r-- 1 root root 0 Nov 4 16:38 system_ready -rw-r--r-- 1 root root 3 Nov 4 18:46 temper -rw-r--r-- 1 root root 3 Nov 4 18:46 temper_C -rw-r--r-- 1 root root 4 Nov 4 18:46 temper_F -rw-r--r-- 1 root root 133 Nov 4 16:38 tmp_mdstat -rw-r--r-- 1 root root 54 Nov 4 18:12 tmp_send.mm -rw-r--r-- 1 root root 20 Nov 4 16:37 uptimes -rw-r--r-- 1 root root 0 Nov 4 16:38 wgetpage.txt /tmp/samba: -rw------- 1 root root 8192 Nov 4 18:45 account_policy.tdb -rw-r--r-- 1 root root 40200 Nov 4 18:45 brlock.tdb -rw-r--r-- 1 root root 231 Nov 4 18:46 browse.dat -rw-r--r-- 1 root root 696 Nov 4 18:45 connections.tdb -rw-r--r-- 1 root root 696 Nov 4 18:45 gencache.tdb -rw------- 1 root root 8192 Nov 4 18:45 group_mapping.tdb -rw-r--r-- 1 root root 372 Nov 4 18:45 locking.tdb -rw------- 1 root root 696 Nov 4 18:45 messages.tdb -rw------- 1 root root 0 Nov 4 18:45 ntdrivers.tdb drwxr-xr-x 2 root root 1024 Nov 4 18:45 perfmon drwxr-xr-x 2 root root 1024 Nov 4 18:45 printing -rw------- 1 root root 8192 Nov 4 18:45 registry.tdb -rw------- 1 root root 8192 Nov 4 18:45 secrets.tdb -rw-r--r-- 1 root root 696 Nov 4 18:45 sessionid.tdb /tmp/samba/perfmon: /tmp/samba/printing: -rw------- 1 root root 16384 Nov 4 18:45 lp.tdb -rw------- 1 root root 24576 Nov 4 18:45 printers.tdb / # du -hsx /* 426.0k /bin 36.0k /default 6.0k /dev 306.0k /etc 0 /ffp 3.0k /home 5.6M /image.cfs 562.0k /lib 1.0k /lost+found 2.2M /mnt 790.0k /proc 1.0k /root 4.0k /sbin 3.0k /sent 6.0k /sys 171.0k /tmp 102.0k /usr 24.0k /var 230.0k /web 1.0k /welcome.msg / #
I know you say its not in the rootfs file system, but notice that /mnt is showing 2.2M, and previously using find / -xdev -size +100k I found 2M of that in:
/mnt/HD_a4/.systemfile/.upnpav-db: drwx------ 2 root root 1024 Nov 4 16:38 . drwxr-xr-x 4 root root 1024 Nov 4 16:38 .. -rw-r--r-- 1 root root 2911232 Nov 4 16:50 upnpav.tmp -rw-r--r-- 1 root root 1024 Nov 4 16:44 upnpav.tmp-journal
Last edited by nicko (2010-11-04 21:07:42)
Offline
nicko, your "find" output looks suspicious. why is it listing /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp?
If I didn't missing something, output of "mount" and "readlink -f /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp" and "df -h /mnt/HD_a4/.systemfile" might help.
Offline
Hi,
fonz wrote:
nicko, your "find" output looks suspicious. why is it listing /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp?
If I didn't missing something, output of "mount" and "readlink -f /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp" and "df -h /mnt/HD_a4/.systemfile" might help.
Please note I had to reboot the system again so its again showing 1.7M free on rootfs...
/ # mount rootfs on / type rootfs (rw) /dev/root on / type ext2 (rw) proc on /proc type proc (rw,nodiratime) /dev/loop0 on /sys/crfs type squashfs (ro) /dev/md0 on /mnt/HD_a2 type ext2 (rw) none on /proc/bus/usb type usbfs (rw) devpts on /dev/pts type devpts (rw) / # readlink -f /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp /mnt/HD_a4/.systemfile/.upnpav-db/upnpav.tmp / # df -h /mnt/HD_a4/.systemfile Filesystem Size Used Available Use% Mounted on rootfs 9.7M 7.5M 1.7M 82% / / #
Cheers
Last edited by nicko (2010-11-04 22:05:10)
Offline