-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature knobs to FBGEMM #3290
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D65174905 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Summary: X-link: facebookresearch/FBGEMM#219 - Use `new[]` and `std::unique_ptr<float[]>` instead of `std::vector`. This avoids an unnecessary indirection and less work for the compiler to drop all the dynamic resizing code that isn't used here anyway. - Also allocate a stack buffer of size 256 and use it instead of a heap allocation when block sizes are small enough. I hope to add specialization mechanisms for common block_sizes later. When specialized for a fixed block_size having a stack allocation is very helpful as it allows the compiler to promote `buf` completely into vector registers (and skip all the related memory operations). Differential Revision: D62591607
Summary: X-link: facebookresearch/FBGEMM#221 Checking that `data_size >= 0` early outside the loops means that the compiler can use a single (unsigned) comparison for the `if (idx < 0 || idx >= data_size)` checks within the loops (instead of 2 signed comparisons). Differential Revision: D62591655
Summary: X-link: facebookresearch/FBGEMM#222 Cleanup/rework `EmbeddingSpMDMAutovec.cc` code: - Fix strict-aliasing violations: When dealing with a `uint8_t *input` pointer we have to use `memcpy` to extract float/float16 values for "bias" and "scale" as direct pointer casts violate strict aliasing rules which is undefined behavior that can lead to misoptimization or lost performance (compilers may be more conservative in alias analysis if they detect violations). Note that the additional `memcpy`s are optimized away by any modern compiler (but provide enough semantic information that the compiler knows its potentially unaligned aliasing memory addresses involved). - Drop most `#pragma omp` annotations. In most places the patterns are so obvious that the compiler vectorizes the code anyway when benefitial. - Make sure loop iteration variables have the same type as their upper bounds. This way the compiler doesn't need to prove that a loop variable with smaller bit with does not overflow. - Factor out address calculation performed on "input" and consistently store addresses in `uint8_t*`. This factors out some common address calculation, and occasionally makes the compiler generate slightly better address operations when I inspected the assembly. - Harmonize handling of weights across the implementations. Differential Revision: D61048245
Summary: Some refactoring and cleanup for the `GenerateEmbeddingXXX` functions. - Drop unnecessary `else` for `ifs` that end in a return statement, like: `if (...) { return; } else if (...) { return; }`. - Consistently surround asmjit usage with `#if CPUINFO_ARCH_X86 || CPUINFO_ARCH_X86_64`. - Introduce `FBGEMM_AUTOVEC_AVAILABLE` and consistently surround `xxx_autovec` usage with it. - Avoid some cases that duplicated autovec/ref kernel calls. Differential Revision: D62982082
Summary: Change the API in `EmbeddingSpMDMAutovec.h` to expose `GenerateEmbeddingXXX_autovec` functions to mirror the style used in `FbgemmEmbedding.h`. This is in preparation to allow the autovec code to select among multiple specialized functions ahead of time. Differential Revision: D62984078
Summary: To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime. The framework added here allows to specify lines like the following for a given function. Each parameter can be set to `var` to not specialize it or `fixed(C)` to create a specialized version with that parameter set to the constant value `C`. Example: ``` SPECIALIZE( /*BIT_RATE=*/fixed(2), /*BLOCK_SIZE=*/var, /*HAS_WEIGHT=*/fixed(true), /*NORMALIZE_BY_LENGTHS=*/var, /*PREFETCH=*/var, /*IS_WEIGHT_POSITIONAL=*/var, /*USE_OFFSETS=*/var, /*OUTPUT_STRIDE=*/fixed(int64_t{-1}), /*INPUT_STRIDE=*/fixed(int64_t{-1}), /*SCALE_BIAS_LAST=*/fixed(true), /*NO_BAG=*/fixed(false), /*IS_BF16_OUT=*/var, /*IS_BF16_IN=*/var) ``` This diff introduces some exemplary specialization for `GenerateEmbeddingSpMDMWithStrides_autovec` and `GenerateEmbeddingSpMDMNBitWithStrides_autovec` specializing them for bit_rate 2, 4 and block sizes 32, 64, 128. This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size. Differential Revision: D62984408
Summary: X-link: facebookresearch/FBGEMM#389 Add feature knobs to FBGEMM Differential Revision: D65174905
This pull request was exported from Phabricator. Differential Revision: D65174905 |
Hi @q10! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Summary: Add feature knobs to FBGEMM
Differential Revision: D65174905