[lm-sensors] [PATCH] w83793 watchdog support.

Hans de Goede hdegoede at redhat.com
Sat Jan 23 17:22:45 CET 2010


On 01/21/2010 10:47 AM, Sven Anders wrote:
> Hello!
> Thanks for you last comments on the code. Now after the holidays, I found
> time to write you an answer and sent you an updated patch as well.

You're welcome, thanks for being persistent, so this can get into the kernel


>> Now the following can happen:
>> 1) Driver attaches to w83793 i2c device
>> 2) app opens /dev/watchdog
>> 3) Driver detaches from w83793 i2c device
>>      (for example because the i2c master module
>>       gets unloaded)
> Is it really possible to unload the i2c master module without
> unloading any dependend module first? I thought this isn't possible...

Yes, their is no dependency relation between the modules, only a driver
binding, when the i2c master goes away, the i2c driver will simply have
its detach function called.

> I assume the already running watchdog app can still access the opened
> file even if the /dev/watchdog node does not exist anymore. So this
> should be a problem.

Correct, this is exactly why the ref counting is there, and why
the watchdog functions called through the device check if client
has not disappeared.

> There is one small problem left:
> If the watchdog_open() functions failes with EBUSY, we must not
> increase the counter.

Oh, good catch that bug is present in the fschmd.c code too. Note that the
way you've fixed this with an unlock in the error path is sort of
frowned up on. It is correct, but we usually try to keep locks and unlocks
in balanced pairs, so that it is easy to check for missing unlocks. See the
patch I've done to fix this same issue for fschmd (attached).

>>> On the other hand, I think you are missing the 'reboot notifier' in
>>> your fschmd driver code.
>> The fsc chips are all part of Fujitsu Siemens Computers, iow the manufacturer
>> of the motherboard == the manufacturer of the watchdog. As such the watchdog
>> always gets initialized by the BIOS, so there is no reason to disable
>> it before shutdown / reboot.
> But you will need it, to prevent the watchdog to reboot the machine, if the
> shutdown sequence needs more time to shutdown than the watchdog is set to.
> Consider the following szenario:
> 1) Watchdog timeout is set to 1 Minute.
> 2) Watchdog helper tool resets the watchdog timer every 30 seconds.
> 3) User executes "shutdown" binary.

Shutdown is pure userspaces, and does not do much more then signal
init to switch runlevel.

> 4) Watchdog timer is currently at 31 seconds and the watchdog helper
>     tool is stopped first.
> 5) Watchdog timer has now about 30 seconds left.
> 6) Heavy loaded database needs much time to write it's caches or perform any
>     other tasks during shutdown. For instance it will need about 2 minutes.

That would be an initscripts bug then, the watchdog tool should be stopped
as one of the last tasks in the runlevel.

> 7) After 30 seconds the hardware watchdog kicks in and reboots the machine
>     leaving the database in an inconsistent state.

This will happen even with a reboot/halt notifier. That won't get called until
the kernel actually is going to do the reboot (as in make the BIOS call to do
a soft reset).

> Please notify me, if I need to make some more changes or if you sent the patch
> upstream.

Well, I don't have any path for sending patches directly upstream, Jean Delvare
usually does that for hwmon tree patches. I can ack this though, telling Jean
it has been reviewed by me and I don't see any more issues.

But atm I still do see a few issues:

Should return data->watchdog_timeout, not the register value, as that register
actually counts down the minutes till it will reboot. IOW it does not contain
the counter reload value (which is what we want to return), but it contains
the actual counter.

+       case WDIOC_SETTIMEOUT:
+               if (get_user(val, (int __user *)arg)) {
+                       ret = -EFAULT;
+                       break;
+               }
+               data->watchdog_timeout = val;
+               ret = watchdog_set_timeout(data, val);
+               if (ret > 0)
+                       ret = put_user(ret, (int __user *)arg);
+               break;

Note that here you store val, which is in seconds! into data->watchdog_timeout,
and then on the next trigger you will write that to W83793_REG_WDT_TIMEOUT,
resulting in a timeout of as many minutes as the user asked seconds.
I think it would best to remove the setting of data->watchdog_timeout
here (esp as no rounding and bounds checking has been done yet), and
set it to stimeout in  watchdog_set_timeout() after all error checking
has been done there.

And I think Jean might fall over the balanced lock unlock thingie, Jean ?


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hwmon-fschmd-fix-memleak.patch
URL: <http://lists.lm-sensors.org/pipermail/lm-sensors/attachments/20100123/05909e7c/attachment.ksh>

More information about the lm-sensors mailing list