Skip to content

Commit

Permalink
wq: handle VM suspension in stall detection
Browse files Browse the repository at this point in the history
[ Upstream commit 940d71c6462e8151c78f28e4919aa8882ff2054e ]

If VCPU is suspended (VM suspend) in wq_watchdog_timer_fn() then
once this VCPU resumes it will see the new jiffies value, while it
may take a while before IRQ detects PVCLOCK_GUEST_STOPPED on this
VCPU and updates all the watchdogs via pvclock_touch_watchdogs().
There is a small chance of misreported WQ stalls in the meantime,
because new jiffies is time_after() old 'ts + thresh'.

wq_watchdog_timer_fn()
{
	for_each_pool(pool, pi) {
		if (time_after(jiffies, ts + thresh)) {
			pr_emerg("BUG: workqueue lockup - pool");
		}
	}
}

Save jiffies at the beginning of this function and use that value
for stall detection. If VM gets suspended then we continue using
"old" jiffies value and old WQ touch timestamps. If IRQ at some
point restarts the stall detection cycle (pvclock_touch_watchdogs())
then old jiffies will always be before new 'ts + thresh'.

Signed-off-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
  • Loading branch information
sergey-senozhatsky authored and gregkh committed Jun 16, 2021
1 parent 3fd1a1a commit a8f2c7b
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions kernel/workqueue.c
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
#include <linux/moduleparam.h>
#include <linux/uaccess.h>
#include <linux/nmi.h>
#include <linux/kvm_para.h>

#include "workqueue_internal.h"

Expand Down Expand Up @@ -5387,6 +5388,7 @@ static void wq_watchdog_timer_fn(unsigned long data)
{
unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
bool lockup_detected = false;
unsigned long now = jiffies;
struct worker_pool *pool;
int pi;

Expand All @@ -5401,6 +5403,12 @@ static void wq_watchdog_timer_fn(unsigned long data)
if (list_empty(&pool->worklist))
continue;

/*
* If a virtual machine is stopped by the host it can look to
* the watchdog like a stall.
*/
kvm_check_and_clear_guest_paused();

/* get the latest of pool and touched timestamps */
pool_ts = READ_ONCE(pool->watchdog_ts);
touched = READ_ONCE(wq_watchdog_touched);
Expand All @@ -5419,12 +5427,12 @@ static void wq_watchdog_timer_fn(unsigned long data)
}

/* did we stall? */
if (time_after(jiffies, ts + thresh)) {
if (time_after(now, ts + thresh)) {
lockup_detected = true;
pr_emerg("BUG: workqueue lockup - pool");
pr_cont_pool_info(pool);
pr_cont(" stuck for %us!\n",
jiffies_to_msecs(jiffies - pool_ts) / 1000);
jiffies_to_msecs(now - pool_ts) / 1000);
}
}

Expand Down

0 comments on commit a8f2c7b

Please sign in to comment.