`fn decomp_tx`: Do not zero `txa` before initialization #1265

rinon · 2024-06-28T02:27:50Z

This adds some complexity, but it improved performance significantly, especially on intel. Zeroing an entire page of stack with memset (which is what was previously happening) is expensive.

kkysen

Why do we need so many ptr casts here? If it's to go from a ref to a ptr, can we use ptr::from_{ref,mut}? They're clearer to read, as the manual ptr cast can be doing other things as well, so it's harder to read.

src/ctx.rs

src/lf_mask.rs

nnethercote · 2024-06-30T22:15:07Z

AFAICT this is the single biggest potential perf win in rav1d.

src/ctx.rs

src/lf_mask.rs

rinon · 2024-07-01T23:06:50Z

In my testing, it looks like the latest version with less unsafe is equivalent to the unsafe pointer writes with the assert bounds check. Commenting that assert out gains a tiny bit on my i7-1260p, but seems equivalent on my Zen 4 7700X.

src/ctx.rs

src/lf_mask.rs

kkysen

I'm not sure what that use std::ptr in ctx.rs is for but the lf_mask.rs changes all LGTM now.

src/ctx.rs

rinon requested review from randomPoison and kkysen June 28, 2024 02:27

kkysen reviewed Jun 28, 2024

View reviewed changes

kkysen self-assigned this Jun 28, 2024

kkysen requested changes Jun 28, 2024

View reviewed changes

src/ctx.rs Outdated Show resolved Hide resolved

src/ctx.rs Outdated Show resolved Hide resolved

src/lf_mask.rs Outdated Show resolved Hide resolved

src/lf_mask.rs Show resolved Hide resolved

rinon force-pushed the sjc/performance branch from 431fcb7 to 2f3437d Compare June 28, 2024 06:21

rinon changed the base branch from sjc/get_lo_ctx/simplify to sjc/tilestatecontext_locking June 28, 2024 06:22

kkysen reviewed Jun 28, 2024

View reviewed changes

src/lf_mask.rs Outdated Show resolved Hide resolved

kkysen reviewed Jun 28, 2024

View reviewed changes

src/lf_mask.rs Show resolved Hide resolved

rinon force-pushed the sjc/performance branch 2 times, most recently from 293dfe8 to 240c45b Compare June 29, 2024 00:43

rinon force-pushed the sjc/tilestatecontext_locking branch from 3f37ee7 to 9d18014 Compare June 29, 2024 00:43

rinon force-pushed the sjc/performance branch 2 times, most recently from c13ec52 to c241b2d Compare June 29, 2024 00:58

rinon requested a review from kkysen June 29, 2024 00:59

rinon force-pushed the sjc/tilestatecontext_locking branch from 9d18014 to c71a595 Compare June 29, 2024 00:59

rinon force-pushed the sjc/performance branch from c241b2d to 032977b Compare June 29, 2024 01:00

rinon force-pushed the sjc/tilestatecontext_locking branch from c71a595 to bd8ac77 Compare June 29, 2024 04:17

rinon force-pushed the sjc/performance branch from 032977b to dac6767 Compare June 29, 2024 04:17

rinon added the performance label Jun 29, 2024

kkysen requested changes Jun 30, 2024

View reviewed changes

src/ctx.rs Outdated Show resolved Hide resolved

src/ctx.rs Outdated Show resolved Hide resolved

src/ctx.rs Outdated Show resolved Hide resolved

src/ctx.rs Outdated Show resolved Hide resolved

src/lf_mask.rs Outdated Show resolved Hide resolved

src/lf_mask.rs Show resolved Hide resolved

Remove mutability on read-only reference

5c9673c

rinon force-pushed the sjc/tilestatecontext_locking branch from bd8ac77 to 2506ce8 Compare July 1, 2024 21:17

rinon force-pushed the sjc/performance branch 2 times, most recently from d46f282 to 8b45876 Compare July 1, 2024 22:05

Base automatically changed from sjc/tilestatecontext_locking to main July 1, 2024 23:07

randomPoison reviewed Jul 1, 2024

View reviewed changes

src/ctx.rs Outdated Show resolved Hide resolved

rinon requested a review from kkysen July 2, 2024 17:44

rinon force-pushed the sjc/performance branch from 8b45876 to 1eeab46 Compare July 2, 2024 23:00

kkysen requested changes Jul 2, 2024

View reviewed changes

src/lf_mask.rs Outdated Show resolved Hide resolved

src/lf_mask.rs Outdated Show resolved Hide resolved

rinon force-pushed the sjc/performance branch from 1eeab46 to f9ab319 Compare July 5, 2024 21:36

rinon requested a review from kkysen July 5, 2024 21:37

kkysen approved these changes Jul 8, 2024

View reviewed changes

src/ctx.rs Outdated Show resolved Hide resolved

Do not zero txa before initializing it

49aaf1b

rinon force-pushed the sjc/performance branch from f9ab319 to 49aaf1b Compare July 8, 2024 20:58

rinon merged commit 412cd4c into main Jul 8, 2024
27 checks passed

rinon deleted the sjc/performance branch July 8, 2024 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fn decomp_tx`: Do not zero `txa` before initialization #1265

`fn decomp_tx`: Do not zero `txa` before initialization #1265

rinon commented Jun 28, 2024

kkysen left a comment

nnethercote commented Jun 30, 2024

rinon commented Jul 1, 2024

kkysen left a comment

fn decomp_tx: Do not zero txa before initialization #1265

fn decomp_tx: Do not zero txa before initialization #1265

Conversation

rinon commented Jun 28, 2024

kkysen left a comment

Choose a reason for hiding this comment

nnethercote commented Jun 30, 2024

rinon commented Jul 1, 2024

kkysen left a comment

Choose a reason for hiding this comment

`fn decomp_tx`: Do not zero `txa` before initialization #1265

`fn decomp_tx`: Do not zero `txa` before initialization #1265