Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: add experimental support for using mimalloc allocator #404

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

wincent
Copy link
Owner

@wincent wincent commented Sep 2, 2022

Vendoring from microsoft/mimalloc and specifically the v2.0.6 tag v2.1.7 tag.

mimalloc is a simple allocator focused on performance and it is easy to drop in as a replacement for malloc() and friends as described in its README. So as not to bring in a dependency on CMake, we just build the static.c version. Sadly, the performance delta (see numbers below) is not a clear win; the numbers are a bit all over the place. This probably isn't that surprising because most of the heavy memory allocation in Command-T is already micro-managed internally (but simply, with little overhead) using big slabs allocated with mmap(). Nevertheless, parking this here as a possible idea.

I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is:

du -sh lua/wincent/commandt/lib/vendor/github/microsoft
4.8M    lua/wincent/commandt/lib/vendor/github/microsoft

As it is not clear whether this is going to be a great idea or not, it only takes effect if you call make with USE_MIMALLOC set. You can verify that it actually is overriding the standard malloc() etc calls by running a command with MIMALLOC_VERBOSE, which will cause it to print some extra info out:

env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua

Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines).

On mid-2015 MacBook Pro

These numbers are all over the map due to thermal throttling.

           best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
  buffer 0.04094 0.04178 0.00278 [-0.6%]        (0.04100) (0.04186) (0.00287) [-0.6%]
    file 0.30707 0.31436 0.02486 [-1.0%]   0.05 (0.30735) (0.31473) (0.02499) [-1.0%]  0.05
    find 0.05827 0.06678 0.01162 [+1.5%]   0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025
     git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025
      rg 0.06419 0.07229 0.01203 [+3.8%]  0.005 (1.61018) (1.66326) (0.08803) [+0.3%]
watchman 0.01095 0.01121 0.00068 [+0.2%]        (1.16830) (1.17605) (0.01835) [+0.6%] 0.005
   total 0.54387 0.56643 0.04391 [+0.4%]        (5.09873) (5.15811) (0.15328) [-0.1%]

                    best      avg      sd      +/-     p     (best)    (avg)      (sd)      +/-     p
     pathological  0.44648  0.48275 0.19826 [-10.0%]  0.01 (0.44705) (0.48350) (0.19793) [-10.0%]  0.01
        command-t  0.41205  0.44292 0.21658  [+3.8%] 0.005 (0.41255) (0.44364) (0.21681)  [+3.8%] 0.005
chromium (subset)  2.75724  2.99017 0.47925  [-1.3%]       (0.51232) (0.55960) (0.17228)  [-1.5%]
 chromium (whole)  3.18933  3.63241 0.64392  [-0.7%]       (0.41821) (0.49571) (0.14853)  [-0.3%]  0.05
       big (400k)  4.90155  5.51271 1.20748  [-1.0%]       (0.65297) (0.74723) (0.23045)  [-4.5%]  0.05
            total 11.74815 13.06097 2.16866  [-1.2%]       (2.47007) (2.72968) (0.54795)  [-2.8%] 0.025

M1 MacBook Pro

           best    avg      sd     +/-     p     (best)    (avg)      (sd)     +/-     p
  buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025
    file 0.20902 0.21428 0.01060 [+1.0%]  0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005
    find 0.02687 0.03006 0.01015 [+3.9%]  0.05 (0.63141) (0.64156) (0.03483) [+0.7%]  0.05
     git 0.02693 0.02995 0.00980 [+2.2%]       (0.71734) (0.72825) (0.04266) [-0.4%]
      rg 0.02916 0.03318 0.01136 [+2.9%]       (0.90193) (0.91710) (0.07157) [+1.4%] 0.005
watchman 0.01100 0.01156 0.00165 [-0.7%]       (1.18802) (1.21274) (0.13422) [+1.5%] 0.005
   total 0.36119 0.37272 0.03632 [+1.1%]       (3.71713) (3.76889) (0.18577) [+0.9%] 0.005

                    best    avg      sd     +/-     p     (best)    (avg)      (sd)     +/-     p
     pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025
        command-t 0.23759 0.24616 0.07356 [+1.6%]       (0.23760) (0.24618) (0.07354) [+1.6%]
chromium (subset) 1.56761 1.58469 0.03655 [-0.3%]       (0.41376) (0.42040) (0.02032) [-0.4%]
 chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%]
       big (400k) 2.90455 2.92204 0.07185 [-0.2%]       (0.48384) (0.50533) (0.07608) [-0.0%]
            total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%]

M3 MacBook Pro

           best    avg      sd      +/-      p     (best)    (avg)      (sd)      +/-      p
  buffer 0.01255 0.01400 0.00409  [+2.0%]        (0.01260) (0.01447) (0.00635)  [-3.3%]
    file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005
    find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005
     git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965)  [+9.1%] 0.0005
      rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921)  [+6.9%] 0.0005
watchman 0.01813 0.01980 0.00287  [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198)  [-0.2%]
   total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475)  [+8.7%] 0.0005

                    best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
     pathological 0.21079 0.22604 0.10943 [+4.8%]  0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025
        command-t 0.16694 0.17164 0.04923 [-0.6%]        (0.16716) (0.17228) (0.05253) [-0.5%]
chromium (subset) 1.35310 1.36239 0.02010 [+0.1%]        (0.28797) (0.29255) (0.01108) [+0.3%]
 chromium (whole) 1.11148 1.11599 0.01258 [+0.3%]   0.01 (0.12167) (0.12478) (0.00828) [-0.2%]
       big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%]
            total 4.52863 4.55855 0.15573 [+0.5%]   0.01 (0.97644) (1.00087) (0.12712) [+1.0%]

Ryzen 5950X Arch Linux

           best    avg      sd     +/-   p   (best)    (avg)      (sd)      +/-     p
  buffer 0.02465 0.02544 0.01098 [-0.4%]   (0.02467) (0.02546) (0.01099)  [-0.5%]
    file 0.09906 0.09948 0.00124 [-0.1%]   (0.09943) (0.09995) (0.00130)  [-0.2%]
    find 0.01852 0.01885 0.00084 [+0.5%]   (0.25137) (0.25430) (0.00762)  [+0.1%]
     git 0.01718 0.01811 0.00210 [+0.6%]   (0.22095) (0.22468) (0.01156)  [-0.6%]
      rg 0.01748 0.01792 0.00105 [+0.5%]   (0.60575) (0.61077) (0.01562)  [-0.1%]
watchman 0.00178 0.00186 0.00033 [-5.6%]   (0.02282) (0.02717) (0.02826) [-11.5%]
   total 0.17975 0.18165 0.01018 [-0.0%]   (1.23025) (1.24233) (0.04061)  [-0.4%] 0.05

                    best    avg      sd     +/-      p     (best)    (avg)      (sd)      +/-      p
     pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946)  [-4.4%] 0.0005
        command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047)  [-3.0%] 0.0005
chromium (subset) 1.83627 1.89158 0.25631 [-3.8%]   0.01 (0.45977) (0.49985) (0.21028) [-15.7%]  0.005
 chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659)  [-0.4%]
       big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644)  [-0.1%]
            total 6.09256 6.19472 0.33431 [-0.2%]        (1.24139) (1.30690) (0.25114)  [-7.5%]  0.005

wincent and others added 7 commits August 13, 2024 18:19
Fixes:

```
luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:199: attempt to call field 'nvim_buf_is_valid' (a nil value)
```
Fixes:

```
luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:244: attempt to index field 'scanners' (a nil value)
```
Vendoring from:

- https://github.com/microsoft/mimalloc

and specifically:

- https://github.com/microsoft/mimalloc/releases/tag/v2.0.6

I added a script to pull down the release archive and dump it into a
directory, because I don't want to use a submodule for this (people
installing a Vim plugin from a Git repo shouldn't have to know/worry
about whether it needs or uses submodules). Space on disk for this set
of files (some of which are obviously redundant in our context) is:

    du -sh lua/wincent/commandt/lib/vendor/github/microsoft
    4.8M    lua/wincent/commandt/lib/vendor/github/microsoft

As it is not clear whether this is going to be a great idea or not, it
only takes effect if you call `make` with `USE_MIMALLOC` set. You can
verify that it actually _is_ overriding the standard `malloc()` etc
calls by running a command with `MIMALLOC_VERBOSE`, which will cause it
to print some extra info out:

    env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua

Impact (unfortunately, a bit inconclusive) on scanner and matcher
benchmarks follows. Note that numbers shouldn't be compared across
machines because they were produced at different times (for example, the
M3 numbers are from a different version of the OS, and the branch was
rebased, compared with the other machines).

On mid-2015 MacBook Pro
=======================

These numbers are all over the map due to thermal throttling.

               best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
      buffer 0.04094 0.04178 0.00278 [-0.6%]        (0.04100) (0.04186) (0.00287) [-0.6%]
        file 0.30707 0.31436 0.02486 [-1.0%]   0.05 (0.30735) (0.31473) (0.02499) [-1.0%]  0.05
        find 0.05827 0.06678 0.01162 [+1.5%]   0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025
         git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025
          rg 0.06419 0.07229 0.01203 [+3.8%]  0.005 (1.61018) (1.66326) (0.08803) [+0.3%]
    watchman 0.01095 0.01121 0.00068 [+0.2%]        (1.16830) (1.17605) (0.01835) [+0.6%] 0.005
       total 0.54387 0.56643 0.04391 [+0.4%]        (5.09873) (5.15811) (0.15328) [-0.1%]

                        best      avg      sd      +/-     p     (best)    (avg)      (sd)      +/-     p
         pathological  0.44648  0.48275 0.19826 [-10.0%]  0.01 (0.44705) (0.48350) (0.19793) [-10.0%]  0.01
            command-t  0.41205  0.44292 0.21658  [+3.8%] 0.005 (0.41255) (0.44364) (0.21681)  [+3.8%] 0.005
    chromium (subset)  2.75724  2.99017 0.47925  [-1.3%]       (0.51232) (0.55960) (0.17228)  [-1.5%]
     chromium (whole)  3.18933  3.63241 0.64392  [-0.7%]       (0.41821) (0.49571) (0.14853)  [-0.3%]  0.05
           big (400k)  4.90155  5.51271 1.20748  [-1.0%]       (0.65297) (0.74723) (0.23045)  [-4.5%]  0.05
                total 11.74815 13.06097 2.16866  [-1.2%]       (2.47007) (2.72968) (0.54795)  [-2.8%] 0.025

M1 MacBook Pro
==============

               best    avg      sd     +/-     p     (best)    (avg)      (sd)     +/-     p
      buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025
        file 0.20902 0.21428 0.01060 [+1.0%]  0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005
        find 0.02687 0.03006 0.01015 [+3.9%]  0.05 (0.63141) (0.64156) (0.03483) [+0.7%]  0.05
         git 0.02693 0.02995 0.00980 [+2.2%]       (0.71734) (0.72825) (0.04266) [-0.4%]
          rg 0.02916 0.03318 0.01136 [+2.9%]       (0.90193) (0.91710) (0.07157) [+1.4%] 0.005
    watchman 0.01100 0.01156 0.00165 [-0.7%]       (1.18802) (1.21274) (0.13422) [+1.5%] 0.005
       total 0.36119 0.37272 0.03632 [+1.1%]       (3.71713) (3.76889) (0.18577) [+0.9%] 0.005

                        best    avg      sd     +/-     p     (best)    (avg)      (sd)     +/-     p
         pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025
            command-t 0.23759 0.24616 0.07356 [+1.6%]       (0.23760) (0.24618) (0.07354) [+1.6%]
    chromium (subset) 1.56761 1.58469 0.03655 [-0.3%]       (0.41376) (0.42040) (0.02032) [-0.4%]
     chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%]
           big (400k) 2.90455 2.92204 0.07185 [-0.2%]       (0.48384) (0.50533) (0.07608) [-0.0%]
                total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%]

M3 MacBook Pro
==============

               best    avg      sd      +/-      p     (best)    (avg)      (sd)      +/-      p
      buffer 0.01255 0.01400 0.00409  [+2.0%]        (0.01260) (0.01447) (0.00635)  [-3.3%]
        file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005
        find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005
         git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965)  [+9.1%] 0.0005
          rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921)  [+6.9%] 0.0005
    watchman 0.01813 0.01980 0.00287  [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198)  [-0.2%]
       total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475)  [+8.7%] 0.0005

                        best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
         pathological 0.21079 0.22604 0.10943 [+4.8%]  0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025
            command-t 0.16694 0.17164 0.04923 [-0.6%]        (0.16716) (0.17228) (0.05253) [-0.5%]
    chromium (subset) 1.35310 1.36239 0.02010 [+0.1%]        (0.28797) (0.29255) (0.01108) [+0.3%]
     chromium (whole) 1.11148 1.11599 0.01258 [+0.3%]   0.01 (0.12167) (0.12478) (0.00828) [-0.2%]
           big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%]
                total 4.52863 4.55855 0.15573 [+0.5%]   0.01 (0.97644) (1.00087) (0.12712) [+1.0%]

Ryzen 5950X Arch Linux
======================

               best    avg      sd     +/-   p   (best)    (avg)      (sd)      +/-     p
      buffer 0.02465 0.02544 0.01098 [-0.4%]   (0.02467) (0.02546) (0.01099)  [-0.5%]
        file 0.09906 0.09948 0.00124 [-0.1%]   (0.09943) (0.09995) (0.00130)  [-0.2%]
        find 0.01852 0.01885 0.00084 [+0.5%]   (0.25137) (0.25430) (0.00762)  [+0.1%]
         git 0.01718 0.01811 0.00210 [+0.6%]   (0.22095) (0.22468) (0.01156)  [-0.6%]
          rg 0.01748 0.01792 0.00105 [+0.5%]   (0.60575) (0.61077) (0.01562)  [-0.1%]
    watchman 0.00178 0.00186 0.00033 [-5.6%]   (0.02282) (0.02717) (0.02826) [-11.5%]
       total 0.17975 0.18165 0.01018 [-0.0%]   (1.23025) (1.24233) (0.04061)  [-0.4%] 0.05

                        best    avg      sd     +/-      p     (best)    (avg)      (sd)      +/-      p
         pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946)  [-4.4%] 0.0005
            command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047)  [-3.0%] 0.0005
    chromium (subset) 1.83627 1.89158 0.25631 [-3.8%]   0.01 (0.45977) (0.49985) (0.21028) [-15.7%]  0.005
     chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659)  [-0.4%]
           big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644)  [-0.1%]
                total 6.09256 6.19472 0.33431 [-0.2%]        (1.24139) (1.30690) (0.25114)  [-7.5%]  0.005
The .prettierignore change is because there are a couple of things in
the Markdown files that Prettier doesn't like.

The clang-format thing comes from a tip here:

- https://stackoverflow.com/a/57272592/2103996

Should prevent CI failures like this one:

- https://github.com/wincent/command-t/actions/runs/2979207632
Wasn't needed on clang, but is needed with gcc:

    /usr/bin/ld: mimalloc-override.o: relocation R_X86_64_TPOFF32
    against `recurse' can not be used when making a shared object;
    recompile with -fPIC
I can't see a changelog or release notes in the repo, so here is the
diff:

- microsoft/mimalloc@v2.0.6...v2.1.7
@wincent
Copy link
Owner Author

wincent commented Aug 13, 2024

Quick test of Hoard, for comparison:

brew tap emeryberger/hoard
brew install --HEAD emeryberger/hoard/libhoard
make clean
make
hoard bin/benchmarks/matcher.lua

Results (relative to wincent/mimalloc branch) on M3:

Summary of cpu time and (wall time):

                    best    avg      sd     +/-      p     (best)    (avg)      (sd)     +/-     p
     pathological 0.20645 0.21815 0.07995 [-3.6%]  0.025 (0.20715) (0.21876) (0.08035) [-3.5%] 0.025
        command-t 0.16663 0.17294 0.05643 [+0.7%]        (0.16724) (0.17352) (0.05677) [+0.7%]
chromium (subset) 1.34275 1.35172 0.02076 [-0.8%] 0.0005 (0.28418) (0.28908) (0.01675) [-1.2%] 0.005
 chromium (whole) 1.10651 1.11530 0.02674 [-0.1%]        (0.12181) (0.12475) (0.01076) [-0.0%]
       big (400k) 1.66873 1.68029 0.03942 [-0.1%]        (0.18046) (0.18403) (0.01414) [-0.5%]
            total 4.49797 4.53841 0.14236 [-0.4%]   0.05 (0.96567) (0.99015) (0.11602) [-1.1%]  0.05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant