DSM-G600, DNS-3xx and NSA-220 Hack Forum

Unfortunately no one can be told what fun_plug is - you have to see it for yourself.

You are not logged in.

Announcement

#1 2009-03-10 08:36:03

jdoering
Member
Registered: 2008-04-10
Posts: 95

Kernel "uptime" bug - monotonic clock broken

I recently stumbled onto the DNS-323 "uptime" bug. See this thread for general discussion and some scripts that attempt to workaround this issue: http://dns323.kood.org/forum/t162-Wrong-uptime.html

I was curious about the issue and the root cause so I dug around a bit. I'm posting in this forum area since the details are gory and the real fix would be in a custom kernel.

Symptoms:

1) uptime is broken (reports ridiculously high values); this is due to bad data in /proc/uptime; 14313+ days on my machine
2) /proc/stat btime is similarly broken (reported as 0 - 2 on my machine with ntpd running)

Background:

The Linux 2.6 kernel (at least 2.6.12.6) internally keeps a monotonic clock as well as the current system time in nanoseconds. The monotonic clock is suppose to be initialized to zero at system boot while the system time is set based on the RTC. Later programs like ntpd may adjust the system time to match it with "true" time.

Two key kernel variables xtime and wall_to_monotonic are used for maintaining the system time and the monotonic clock. Basically xtime keeps track of the system time and wall_to_monotonic is an offset to convert system time to monotonic time. wall_to_monotonic should be negative as the offset is added roughly as follows (nanosecond precision details are ignored here):

MONOTONIC_TIME = xtime + wall_to_monotonic; // Zero at system boot

An examination of the procfs code shows that the  two problematic values noted in /proc are computed roughly as follows:

btime = -wall_to_monotonic;
uptime = xtime + wall_to_monotonic;

These details indicate that btime is being set very low which in turn results in uptime roughly equallying system time. So the DNS-323 thinks it has been up since 1970!

One interesting point is that system time can change while a monotonic clock cannot. Tracing kernel code reveals that flows that update the system clock make corresponding adjustments to wall_to_monotonic to ensure that it remains correct with regard to its original reference point. Monotonic time is also used to track the start time (as shown by ps or /proc/<pid>/stat field #22) of processes. So you'll notice very large values in /proc/<pid>/stat too. However general use of the monotonic clock is not broken; just cases that assume boot time = zero on the monotonic clock (there are a few places in the kernel that look at this stuff).

The Bug:

Given that xtime and wall_to_monotonic are kept in sync everything looks fine. It turns out that the problem is during kernel initialization. General references refer to utilities like hwclock running early in the kernel init process (I've seen some kernels like this where logging starts near 1970 and then suddenly jumps forward). However on the Marvell kernel for the DNS-323 it doesn't work like this. hwclock would certainly go through calls that would update both xtime and wall_to_monotonic. Both values initialize to zero (which is fine if your computer is really starting on Jan 1, 1970).

A quick grep of 'arch/arm/mach-mv88fxx81' for "xtime" turns up: /linux-2.6.12.6/arch/arm/mach-mv88fxx81/LSP/time.c


Code:

static int mv_rtc_init(void)
{
        MV_RTC_TIME time;
        mvRtcDS1339TimeGet(&time);

        /* same as in the U-Boot we use the year for century 20 only */
        xtime.tv_sec = mktime ( time.year + 2000, time.month,
                                time.date, time.hours,
                                time.minutes, time.seconds);

        to_tm(xtime.tv_sec, &time);
        set_rtc = mv_set_rtc;
        register_rtc(&rtc_ops);
        return 0;
}

This code appears broken because it will update the xtime value (presumably from near zero) to the current time read from the RTC but it does NOT adjust wall_to_monotonic.

This results in wall_to_monotonic staying at zero (until subsequent time updates by ntpd and such may move it a small amount depending on how far off the RTC is). Once this damage is done there is no clean way to fix it since all of the normal kernel functions for adjusting the time deliberately correct the delta between wall_to_monotonic and xtime.

It looks like this would be very easy to fix in a custom kernel. Just set wall_to_monotonic = -xtime above. Boot time would then be fixed at the time the RTC was read above; presumably very early in kernel initialization. I have not pursued this approach as I'm not set up for testing custom kernels (and have B1 hardware so I think there are issue with ffp-reload; I haven't looked much though).

Experimental Workaround:

I decided to experiment with a workaround just "for fun". I wrote the attached kernel module which adjusts the wall_to_monotonic value back to a reasonable boot time.

There are a few issues:

1) wall_to_monotonic is not exported; the workaround uses an nasty hack that assumes the address is fixed for the stock kernel (and relative to the exported xtime address)
2) The actual boot time is not known. The workaround adds up current CPU time (not sure if that code is right) to make a guess.
3) This hack makes the monotonic clock jump WAY back. Not so good for a monotonic clock.
3a) This makes process start times occur way in the future. The module corrects for this.
3b) The module does NOT handle existing kernel timers (like clock_was_set() does). This could be really bad. I don't know the Linux kernel enough to really know. It might be possible to replicate more code from clock_was_set; but likely more hacking around not-exported things.
3c) Any "other" userland, etc consumers of the monotonic clock may not like it jumping backward. Risk on DNS-323 unknown.

Anyway; coding the module was an interesting learning experience and it seems to work in practice. No crashes and no observable bad symptoms. Uptime works as expected, process start times look good, etc.

I wrote this up to share the info and also help anyone who might be patching the stock kernel; a patch for time.c above looks pretty simple. This should also dispel some of the confusion on exactly what is wrong with uptime.

I'm thinking of an alternative less intrusive fix but haven't investigated it yet. It probably wouldn't be clean either; but a kernel module that replaced the handling of /proc/uptime and /proc/stat could fix the user-mode cosmetic issues without causing the monotonic clock to jump forward. While I think kernel modules can add to /proc; I doubt it's clean to replace existing handlers.

-Jeff


Attachments:
Attachment Icon fixdns323uptime.c, Size: 4,110 bytes, Downloads: 644

Offline

 

#2 2009-03-10 09:44:14

fonz
Member / Developer
From: Berlin
Registered: 2007-02-06
Posts: 1716
Website

Re: Kernel "uptime" bug - monotonic clock broken

jdoering wrote:

It looks like this would be very easy to fix in a custom kernel.

Just for the records. There exists a kernel patch for the uptime bug since 2007 (unfortunately, I don't remember who actually wrote it): http://www.inreto.de/dns323/kernel/patches-2.6.12.6/

Offline

 

#3 2009-03-11 02:10:31

jdoering
Member
Registered: 2008-04-10
Posts: 95

Re: Kernel "uptime" bug - monotonic clock broken

Interesting. I hadn't seen that mentioned in the threads I saw on here... (I probably didn't dig through all of them though).

Not much else in that directory; the kernel is otherwise bug free huh smile I see that it can be done by hand; but it doesn't look like there's much interest in fully packaged custom DNS-323 firmware. Something like the NSLU2 had with the stock features but custom additions, kernel, etc would be very cool.

It seems like a lot of the fudgy behaviors (RAID format handling, etc) on the DNS-323 are unfortunately in proprietary compiled utilities which diminishes the value of fixing stuff in the stock functionality versus starting from scratch.

Offline

 

#4 2009-05-01 13:33:45

mastervol
Member
Registered: 2008-09-06
Posts: 81

Re: Kernel "uptime" bug - monotonic clock broken

how would i apply the patch for the uptime bug?


DNS-323     F/W: 1.06  H/W: ??  ffp: 0.5  Drives (normal mode): 1 x 1,5 TB Seagate SATA II ST31500341AS, 1 x 250 GB Western Digital SATA I

Offline

 

#5 2010-05-02 23:25:02

Darkman
Member
From: Sonoran Desert
Registered: 2010-04-23
Posts: 30

Re: Kernel "uptime" bug - monotonic clock broken

bump.
i too would like to know how to fix the uptime bug..
as in- anyone figure out how to apply the patch?
does it work on the 343?


DNS-343 1.04b03
4x 1.0T Seagate Barracuda 7200.11 RAID5 + 3TB external USB storage for occasional rsync backup.
Transmission-2.22-1 Automatic-0.6.4-1 netatalk-2.1.3-1 rsync-3.0.7-1 nano curl fortune screen SSH SSL SMBget wget
...and the all-powerful foo.sh smile

Offline

 

#6 2010-05-19 05:58:03

zuluwalker
Member
Registered: 2009-11-09
Posts: 27

Re: Kernel "uptime" bug - monotonic clock broken

also interested in getting the uptime stats fixed - any updated on this?

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2010 PunBB