Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: Truly dynamic vector types #4418

Open
Tracked by #9464
sparker-arm opened this issue Jul 8, 2022 · 0 comments
Open
Tracked by #9464

cranelift: Truly dynamic vector types #4418

sparker-arm opened this issue Jul 8, 2022 · 0 comments
Labels
cranelift:area:clif cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:regalloc Issues related to register allocation. cranelift Issues related to the Cranelift code generator

Comments

@sparker-arm
Copy link
Contributor

Current Status

Basic support for dynamic vector types has been committed in preparation to support the Wasm Flexible Vectors extension. The RFC discussion around the design is here. And while we now have support for dynamic/flexible/scalable vectors, it doesn't support proper dynamic types. This is a top-level issue for what needs to be done to get there.

Cranelift's type system now contains a dynamic vector type for each corresponding fixed-width type, with the dynamic type being a fixed-width type scaled by a target-defined factor. Currently, this factor (dyn_scale_target_const) is legalized to a constant. This makes lowering and stack layout simple. Below is an example CLIF function.

function %i8x16_splat_add(i8, i8) -> i8x16 {
  gv0 = dyn_scale_target_const.i8x16
  dt0 = i8x16*gv0

block0(v0: i8, v1: i8):
  v2 = splat.dt0 v0
  v3 = splat.dt0 v1
  v4 = iadd v2, v3
  v5 = extract_vector v4, 0
  return v5
}

Next Steps

The two main areas that need more work are in IR and the MachInst ABI layer. The first part is to modify, or introduce a new GlobalValue, that will be legalized and lowered to a runtime scaling value. Maybe in a similar way to how global values are currently used in the ABI for generating the stack limit.

The second, and bigger issue, is in the ABI layer where we determine the stack layout. A complicating factor here is that the 'Vanilla' layer is mainly shared between the backends and, of course, everything is also designed around known constant sizes. However, I think the biggest challenge is the interface with the register allocator.

Spill Slots
The register allocator currently only supports two register classes, and types are aren't tracked, so there is the potential for a target with wide vector support to use far more stack than necessary. For example, with the current implementation, a target using AVX-512 would require 64-bytes to spill a single precision float.

To enable truly dynamic types, the interface with the register allocator will need to change. If we leave it to return a constant value, we will need to accommodate the maximum possible register size (2KB in the case of SVE) and that is a prohibitive cost for most CPUs.

Also, with wider vectors usually comes predication and predicate registers are unlikely to map to either of the existing regalloc classes either.

Stack Slots
Along with spill slots, our frame layout will need to handle dynamically sized stack slots which are defined in the IR or, possibly, arguments passed on the stack. The current stack layout is as follows (there's currently no distinction between spill slots for fixed and dynamic types):

//!   (high address)
//!
//!                              +---------------------------+
//!                              |          ...              |
//!                              | stack args                |
//!                              | (accessed via FP)         |
//!                              +---------------------------+
//! SP at function entry ----->  | return address            |
//!                              +---------------------------+
//! FP after prologue -------->  | FP (pushed by prologue)   |
//!                              +---------------------------+
//!                              |          ...              |
//!                              | clobbered callee-saves    |
//! unwind-frame base     ---->  | (pushed by prologue)      |
//!                              +---------------------------+
//!                              |          ...              |
//!                              | spill slots               |
//!                              | (accessed via nominal SP) |
//!                              |          ...              |
//!                              | sized stack slots         |
//!                              | dynamic stack slots       |
//!                              | (accessed via nominal SP) |
//! nominal SP --------------->  | (alloc'd by prologue)     |
//! (SP at end of prologue)      +---------------------------+
//!                              | [alignment as needed]     |
//!                              |          ...              |
//!                              | args for call             |
//! SP before making a call -->  | (pushed at callsite)      |
//!                              +---------------------------+
//!
//!   (low address)

We likely want to collect all the dynamically sized stack values and move them up the stack and introduce a new StackAMode to be addressed by FP. Spill and stack slots of compile-time known sizes can accessed as they are now, but the way we calculate at the end of the prologue will need to be modified. The current implementation allows the TargetIsa to report a fixed size for each dynamic type and so stack offsets can be calculated at compile-time. For dynamically-sized objects, I think we'll want to use vmctx to generate our scaling factor from a GlobalValue and then multiple a slot index by the scaling factor to get our address.

It could be that a target wants to specify multiple scaling values though, depending on the type/register that will be used. So, we could group the values so that each group is using the same scale value. The awkward part here is that we won't have a uniform space to scale across, and so we'll need a method to 'jump' over groups.

@sparker-arm sparker-arm added cranelift:area:regalloc Issues related to register allocation. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:clif labels Jul 8, 2022
@akirilov-arm akirilov-arm added the cranelift Issues related to the Cranelift code generator label Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:clif cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:regalloc Issues related to register allocation. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

No branches or pull requests

2 participants