Skip to content

Commit

Permalink
FROMGIT drm/msm/a5xx: fix races in preemption evaluation stage
Browse files Browse the repository at this point in the history
On A5XX GPUs when preemption is used it's invietable to enter a soft
lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
This appears as full UI lockup and not detected as GPU hang (because
it's not). This happens due to not triggering preemption when it was
needed. Sometimes this state can be recovered by some new submit but
generally it won't happen because applications are waiting for old
submits to retire.

One of the reasons why this happens is a race between a5xx_submit and
a5xx_preempt_trigger called from IRQ during submit retire. Former thread
updates ring->cur of previously empty and not current ring right after
latter checks it for emptiness. Then both threads can just exit because
for first one preempt_state wasn't NONE yet and for second one all rings
appeared to be empty.

To prevent such situations from happening we need to establish guarantee
for preempt_trigger to make decision after each submit or retire. To
implement this we serialize preemption initiation using spinlock. If
switch is already in progress we need to re-trigger preemption when it
finishes.

Fixes: b1fc283 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612045/
Signed-off-by: Rob Clark <[email protected]>
  • Loading branch information
vldly authored and barni2000 committed Sep 17, 2024
1 parent 4810de3 commit 4a9e298
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
1 change: 1 addition & 0 deletions drivers/gpu/drm/msm/adreno/a5xx_gpu.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ struct a5xx_gpu {
uint64_t preempt_iova[MSM_GPU_MAX_RINGS];

atomic_t preempt_state;
spinlock_t preempt_start_lock;
struct timer_list preempt_timer;

struct drm_gem_object *shadow_bo;
Expand Down
24 changes: 22 additions & 2 deletions drivers/gpu/drm/msm/adreno/a5xx_preempt.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,12 +97,19 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
if (gpu->nr_rings == 1)
return;

/*
* Serialize preemption start to ensure that we always make
* decision on latest state. Otherwise we can get stuck in
* lower priority or empty ring.
*/
spin_lock_irqsave(&a5xx_gpu->preempt_start_lock, flags);

/*
* Try to start preemption by moving from NONE to START. If
* unsuccessful, a preemption is already in flight
*/
if (!try_preempt_state(a5xx_gpu, PREEMPT_NONE, PREEMPT_START))
return;
goto out;

/* Get the next ring to preempt to */
ring = get_next_ring(gpu);
Expand All @@ -127,9 +134,11 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
set_preempt_state(a5xx_gpu, PREEMPT_ABORT);
update_wptr(gpu, a5xx_gpu->cur_ring);
set_preempt_state(a5xx_gpu, PREEMPT_NONE);
return;
goto out;
}

spin_unlock_irqrestore(&a5xx_gpu->preempt_start_lock, flags);

/* Make sure the wptr doesn't update while we're in motion */
spin_lock_irqsave(&ring->preempt_lock, flags);
a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
Expand All @@ -152,6 +161,10 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)

/* And actually start the preemption */
gpu_write(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL, 1);
return;

out:
spin_unlock_irqrestore(&a5xx_gpu->preempt_start_lock, flags);
}

void a5xx_preempt_irq(struct msm_gpu *gpu)
Expand Down Expand Up @@ -188,6 +201,12 @@ void a5xx_preempt_irq(struct msm_gpu *gpu)
update_wptr(gpu, a5xx_gpu->cur_ring);

set_preempt_state(a5xx_gpu, PREEMPT_NONE);

/*
* Try to trigger preemption again in case there was a submit or
* retire during ring switch
*/
a5xx_preempt_trigger(gpu);
}

void a5xx_preempt_hw_init(struct msm_gpu *gpu)
Expand Down Expand Up @@ -300,5 +319,6 @@ void a5xx_preempt_init(struct msm_gpu *gpu)
}
}

spin_lock_init(&a5xx_gpu->preempt_start_lock);
timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0);
}

0 comments on commit 4a9e298

Please sign in to comment.