Linux Watchdog Daemon - Installation

Back to PSC's home page
Back to Watchdog


Installation

In most Linux distributions you can install from the package manager. For example on Ubuntu/Debian-based systems:

apt-get install watchdog

However, you should also consider installing the lm-sensors package and running the sensors-detect script as that can help identify what hardware your machine has, and then you can check to see if it has a supporting watchdog driver module. Then take a look in the likes of /etc/modprobe.d/blacklist-watchdog.conf to see if the same/similar chip is mentioned, if so add that driver from /etc/modprobe.d/blacklist-watchdog.conf to /etc/modules (or modprobe it) and you should then have hardware support. Better still is to add it to /etc/default/watchdog by editing the line watchdog_module="none" as that is loaded on demand, and gets round the buggy behaviour of systemd not loading explicitly listed modules that are blacklisted for auto-load.

NOTE: To replace the daemon with any special build you should stop it first, as described below. It is also a good idea to rename the original version and keep it so you can revert if anything goes wrong when testing V6.0

Another way of identifying any hardware watchdog options is to use the 'lshw' command (as root/sudo) to find the chip set(s) used, then to search for drivers or documentation for those chips that allow you to establish if they have watchdog timers, and if so what driver should work.

If all else fails, consider loading the 'softdog' module to emulate the hardware. It is not nearly as good, but better than nothing.
[top of page]

Starting and Stopping

Normally the daemons are started and stopped by the scripts such as /etc/init.d/watchdog but the usual command to do this (as root, or using sudo) is:

service watchdog start
service watchdog stop

However, this script is actually swapping execution of 'watchdog' and 'wd_keepalive' in a similar manner to the system starting and stopping.

NOTE: This swapping behaviour is essential in the unlikely case that your kernel was compiled with the option CONFIG_WATCHDOG_NOWAYOUT, or a WDT module was loaded with that option, as then you cannot turn the watchdog off after starting it, so you always need to be running something to stop a reboot. Swapping daemons is then a way of allowing you to replace the binary and/or change the configuration file safely.

The daemons are actually stopped by sending the signal SIGTERM (usually 15) to them which is the "polite" way of requesting a program to terminate. They trap this signal and when detected break from the main polling loop and exit cleanly (closing the watchdog hardware). Thus it can take up to the configured polling time interval for this to stop the process.

NOTE: If you kill the daemon by another signal, such as sending SIGKILL (usually 9, a non-ignorable signal) or SIGINT (usually 2, typically from Ctrl+C when running in the foreground) then it will not close the watchdog device and you can expect a hardware reboot to occur shortly unless the daemon is restarted!

Typically to really stop the daemon (and not just run wd_keepalive in its place) you can do this using pkill (again as root or using sudo):

pkill watchdog

By default pkill sends SIGTERM, which is what you normally want. You can start the daemon from the command line for testing, and if you want to see the output of the daemon and any child test/repair process in real-time (rather than looking at log files such as syslog) you can use the foreground option, for example:

watchdog --foreground

This will stop it becoming a background daemon and so it will run like a normal foreground process. To stop it open another terminal window and send it SIGTERM using pkill (i.e. don't use Ctrl+C).
[top of page]

Last Updated on 26-Aug-2019 by Paul Crawford
Copyright (c) 2014-19 by Paul S. Crawford. All rights reserved.
Email psc(at)sat(dot)dundee(dot)ac(dot)uk
Absolutely no warranty, use this information at your own risk.