cranelift: Truly dynamic vector types #4418
Labels
cranelift:area:clif
cranelift:area:machinst
Issues related to instruction selection and the new MachInst backend.
cranelift:area:regalloc
Issues related to register allocation.
cranelift
Issues related to the Cranelift code generator
Current Status
Basic support for dynamic vector types has been committed in preparation to support the Wasm Flexible Vectors extension. The RFC discussion around the design is here. And while we now have support for dynamic/flexible/scalable vectors, it doesn't support proper dynamic types. This is a top-level issue for what needs to be done to get there.
Cranelift's type system now contains a dynamic vector type for each corresponding fixed-width type, with the dynamic type being a fixed-width type scaled by a target-defined factor. Currently, this factor (
dyn_scale_target_const
) is legalized to a constant. This makes lowering and stack layout simple. Below is an example CLIF function.Next Steps
The two main areas that need more work are in IR and the MachInst ABI layer. The first part is to modify, or introduce a new
GlobalValue
, that will be legalized and lowered to a runtime scaling value. Maybe in a similar way to how global values are currently used in the ABI for generating the stack limit.The second, and bigger issue, is in the ABI layer where we determine the stack layout. A complicating factor here is that the 'Vanilla' layer is mainly shared between the backends and, of course, everything is also designed around known constant sizes. However, I think the biggest challenge is the interface with the register allocator.
Spill Slots
The register allocator currently only supports two register classes, and types are aren't tracked, so there is the potential for a target with wide vector support to use far more stack than necessary. For example, with the current implementation, a target using AVX-512 would require 64-bytes to spill a single precision float.
To enable truly dynamic types, the interface with the register allocator will need to change. If we leave it to return a constant value, we will need to accommodate the maximum possible register size (2KB in the case of SVE) and that is a prohibitive cost for most CPUs.
Also, with wider vectors usually comes predication and predicate registers are unlikely to map to either of the existing regalloc classes either.
Stack Slots
Along with spill slots, our frame layout will need to handle dynamically sized stack slots which are defined in the IR or, possibly, arguments passed on the stack. The current stack layout is as follows (there's currently no distinction between spill slots for fixed and dynamic types):
We likely want to collect all the dynamically sized stack values and move them up the stack and introduce a new
StackAMode
to be addressed by FP. Spill and stack slots of compile-time known sizes can accessed as they are now, but the way we calculate at the end of the prologue will need to be modified. The current implementation allows theTargetIsa
to report a fixed size for each dynamic type and so stack offsets can be calculated at compile-time. For dynamically-sized objects, I think we'll want to usevmctx
to generate our scaling factor from aGlobalValue
and then multiple a slot index by the scaling factor to get our address.It could be that a target wants to specify multiple scaling values though, depending on the type/register that will be used. So, we could group the values so that each group is using the same scale value. The awkward part here is that we won't have a uniform space to scale across, and so we'll need a method to 'jump' over groups.
The text was updated successfully, but these errors were encountered: