Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add specialized copy method #143

Merged
merged 9 commits into from
Sep 14, 2020
Merged

Add specialized copy method #143

merged 9 commits into from
Sep 14, 2020

Conversation

joaquimg
Copy link
Member

Following the lines of jump-dev/Clp.jl#94

Add a batch copy, it seems that the batch is not much better than one-by-one for GLPK.

Running the runbench.jl file from the perf folder.

Before this PR we had:

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:       143s / 100%            30.4GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v      100    32.9s  23.0%   329ms   7.23GiB  23.8%  74.0MiB
   build      100    18.8s  13.1%   188ms   7.22GiB  23.7%  73.9MiB
   opt        100    13.8s  9.66%   138ms   7.64MiB  0.02%  78.2KiB
 cs           100    29.0s  20.3%   290ms   5.28GiB  17.4%  54.1MiB
   build      100    15.0s  10.5%   150ms   5.28GiB  17.3%  54.0MiB
   opt        100    13.7s  9.61%   137ms   7.64MiB  0.02%  78.2KiB
 bc + s       100    27.8s  19.4%   278ms   5.83GiB  19.2%  59.7MiB
   opt        100    13.6s  9.53%   136ms     0.00B  0.00%    0.00B
   copy       100    10.8s  7.52%   108ms   3.81GiB  12.5%  39.1MiB
   build      100    3.12s  2.19%  31.2ms   2.01GiB  6.62%  20.6MiB
 bcs          100    26.5s  18.6%   265ms   5.28GiB  17.4%  54.1MiB
   opt        100    13.8s  9.63%   138ms   7.64MiB  0.02%  78.2KiB
   build      100    12.5s  8.72%   125ms   5.28GiB  17.3%  54.0MiB
 c + s        100    24.8s  17.3%   248ms   5.27GiB  17.3%  53.9MiB
   opt        100    13.6s  9.50%   136ms     0.00B  0.00%    0.00B
   copy       100    9.07s  6.34%  90.7ms   3.25GiB  10.7%  33.3MiB
   build      100    1.88s  1.32%  18.8ms   2.01GiB  6.62%  20.6MiB
 data         100    2.00s  1.40%  20.0ms   1.54GiB  5.06%  15.8MiB
 ──────────────────────────────────────────────────────────────────

After:

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:       138s / 100%            28.7GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v      100    31.0s  22.4%   310ms   7.23GiB  25.2%  74.0MiB
   build      100    16.9s  12.3%   169ms   7.22GiB  25.2%  73.9MiB
   opt        100    13.7s  9.94%   137ms   7.64MiB  0.03%  78.2KiB
 cs           100    30.6s  22.2%   306ms   5.28GiB  18.4%  54.1MiB
   build      100    16.6s  12.0%   166ms   5.28GiB  18.4%  54.0MiB
   opt        100    13.7s  9.91%   137ms   7.64MiB  0.03%  78.2KiB
 bcs          100    26.4s  19.1%   264ms   5.28GiB  18.4%  54.1MiB
   opt        100    13.7s  9.93%   137ms   7.64MiB  0.03%  78.2KiB
   build      100    12.3s  8.94%   123ms   5.28GiB  18.4%  54.0MiB
 bc + s       100    24.8s  18.0%   248ms   4.82GiB  16.8%  49.3MiB
   opt        100    13.1s  9.51%   131ms     0.00B  0.00%    0.00B
   copy       100    9.08s  6.58%  90.8ms   2.80GiB  9.78%  28.7MiB
   build      100    2.34s  1.69%  23.4ms   2.01GiB  7.03%  20.6MiB
 c + s        100    22.7s  16.5%   227ms   4.50GiB  15.7%  46.1MiB
   opt        100    13.1s  9.51%   131ms     0.00B  0.00%    0.00B
   copy       100    7.29s  5.28%  72.9ms   2.49GiB  8.69%  25.5MiB
   build      100    2.07s  1.50%  20.7ms   2.01GiB  7.03%  20.6MiB
 data         100    2.58s  1.87%  25.8ms   1.54GiB  5.37%  15.8MiB
 ──────────────────────────────────────────────────────────────────

I haven't used the profiler though, there might be some gains.
Moreover, jump-dev/MathOptInterface.jl#1122 could help here.

@joaquimg
Copy link
Member Author

If we skip MOIU.canonical, we get:

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:       137s / 100%            26.9GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v      100    31.5s  23.1%   315ms   7.23GiB  26.9%  74.0MiB
   build      100    17.5s  12.8%   175ms   7.22GiB  26.9%  73.9MiB
   opt        100    13.7s  10.0%   137ms   7.64MiB  0.03%  78.2KiB
 cs           100    28.5s  20.9%   285ms   5.28GiB  19.6%  54.1MiB
   build      100    14.6s  10.7%   146ms   5.28GiB  19.6%  54.0MiB
   opt        100    13.6s  10.0%   136ms   7.64MiB  0.03%  78.2KiB
 bcs          100    26.4s  19.3%   264ms   5.28GiB  19.6%  54.1MiB
   opt        100    13.7s  10.0%   137ms   7.64MiB  0.03%  78.2KiB
   build      100    12.4s  9.07%   124ms   5.28GiB  19.6%  54.0MiB
 bc + s       100    26.2s  19.2%   262ms   3.93GiB  14.6%  40.3MiB
   opt        100    13.1s  9.61%   131ms     0.00B  0.00%    0.00B
   copy       100    8.48s  6.21%  84.8ms   1.92GiB  7.13%  19.6MiB
   build      100    4.30s  3.15%  43.0ms   2.01GiB  7.49%  20.6MiB
 c + s        100    21.9s  16.0%   219ms   3.62GiB  13.5%  37.1MiB
   opt        100    13.1s  9.59%   131ms     0.00B  0.00%    0.00B
   copy       100    6.70s  4.90%  67.0ms   1.61GiB  5.97%  16.4MiB
   build      100    1.80s  1.32%  18.0ms   2.01GiB  7.49%  20.6MiB
 data         100    2.17s  1.59%  21.7ms   1.54GiB  5.73%  15.8MiB
 ──────────────────────────────────────────────────────────────────

@codecov-commenter
Copy link

codecov-commenter commented Jul 11, 2020

Codecov Report

Merging #143 into master will increase coverage by 1.32%.
The diff coverage is 95.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #143      +/-   ##
==========================================
+ Coverage   84.07%   85.39%   +1.32%     
==========================================
  Files           7        8       +1     
  Lines        1237     1390     +153     
==========================================
+ Hits         1040     1187     +147     
- Misses        197      203       +6     
Impacted Files Coverage Δ
src/GLPK.jl 100.00% <ø> (ø)
src/MOI_wrapper/MOI_wrapper.jl 89.62% <89.47%> (+0.10%) ⬆️
src/MOI_wrapper/MOI_copy.jl 95.83% <95.83%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57e08b7...ce678aa. Read the comment docs.

Copy link
Member

@odow odow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding this for the minor speedup then?

@joaquimg
Copy link
Member Author

So there are a few caveats,
This as is now not super useful.
However, considering the contiguous indexing might boost a lot this.

@odow
Copy link
Member

odow commented Jul 11, 2020

One difference is that this is allocating way more than the Clp version. Do we know why?

@joaquimg
Copy link
Member Author

Clp does not have the CleverDict to keep track of data later, that's my current bet.

@joaquimg
Copy link
Member Author

I modified the comparisson scripts:
Before

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:      36.9s / 49.4%           5.79GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v       20    4.96s  27.2%   248ms   1.44GiB  25.0%  74.0MiB
   build       20    3.98s  21.9%   199ms   1.44GiB  24.9%  73.9MiB
   opt         20    916ms  5.03%  45.8ms   1.53MiB  0.03%  78.2KiB
 bcs           20    3.53s  19.4%   176ms   1.06GiB  18.2%  54.1MiB
   build       20    2.54s  14.0%   127ms   1.05GiB  18.2%  54.0MiB
   opt         20    914ms  5.02%  45.7ms   1.53MiB  0.03%  78.2KiB
 cs            20    3.44s  18.9%   172ms   1.06GiB  18.2%  54.1MiB
   build       20    2.46s  13.5%   123ms   1.05GiB  18.2%  54.0MiB
   opt         20    912ms  5.01%  45.6ms   1.53MiB  0.03%  78.2KiB
 bc + s        20    3.19s  17.5%   159ms   1.17GiB  20.1%  59.7MiB
   copy        20    2.10s  11.5%   105ms    781MiB  13.2%  39.0MiB
   opt         20    843ms  4.63%  42.1ms     0.00B  0.00%    0.00B
   build       20    183ms  1.01%  9.17ms    412MiB  6.95%  20.6MiB
 c + s         20    3.08s  16.9%   154ms   1.05GiB  18.2%  53.9MiB
   copy        20    1.92s  10.6%  96.1ms    666MiB  11.2%  33.3MiB
   opt         20    917ms  5.03%  45.8ms     0.00B  0.00%    0.00B
   build       20    185ms  1.02%  9.25ms    412MiB  6.95%  20.6MiB
 data           1   13.4ms  0.07%  13.4ms   15.8MiB  0.27%  15.8MiB
 ──────────────────────────────────────────────────────────────────

After

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:      36.2s / 47.3%           5.45GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v       20    5.08s  29.6%   254ms   1.44GiB  26.5%  74.0MiB
   build       20    4.07s  23.8%   203ms   1.44GiB  26.5%  73.9MiB
   opt         20    933ms  5.45%  46.7ms   1.53MiB  0.03%  78.2KiB
 cs            20    3.59s  21.0%   179ms   1.06GiB  19.4%  54.1MiB
   build       20    2.58s  15.1%   129ms   1.05GiB  19.4%  54.0MiB
   opt         20    946ms  5.52%  47.3ms   1.53MiB  0.03%  78.2KiB
 bcs           20    3.52s  20.6%   176ms   1.06GiB  19.4%  54.1MiB
   build       20    2.51s  14.6%   125ms   1.05GiB  19.4%  54.0MiB
   opt         20    948ms  5.53%  47.4ms   1.53MiB  0.03%  78.2KiB
 bc + s        20    2.71s  15.8%   136ms   0.97GiB  17.8%  49.7MiB
   copy        20    1.64s  9.57%  81.9ms    582MiB  10.4%  29.1MiB
   opt         20    818ms  4.77%  40.9ms     0.00B  0.00%    0.00B
   build       20    194ms  1.13%  9.71ms    412MiB  7.39%  20.6MiB
 c + s         20    2.21s  12.9%   110ms    924MiB  16.6%  46.2MiB
   copy        20    1.15s  6.73%  57.6ms    512MiB  9.18%  25.6MiB
   opt         20    794ms  4.64%  39.7ms     0.00B  0.00%    0.00B
   build       20    201ms  1.17%  10.0ms    412MiB  7.39%  20.6MiB
 data           1   13.6ms  0.08%  13.6ms   15.8MiB  0.28%  15.8MiB
 ──────────────────────────────────────────────────────────────────

Timing are better because I forced GC in between tests.
Main comparison here is the copy lines.
When solver is integrated with Cache it is not using the new copy_to, probably passing directly...

@joaquimg
Copy link
Member Author

joaquimg commented Jul 11, 2020

after the PR we have the following:
image

Rough estimates:
43% glp_load_matrix
12% canonical (jump-dev/MathOptInterface.jl#1118)
20% adding constraint index to conmap (similar jump-dev/MathOptInterface.jl#1122)
9% pass constraint attributes (note that there is nothing to pass in this test) (jump-dev/MathOptInterface.jl#1121)

basically there is still 40% of the time that we are doing bad

@joaquimg
Copy link
Member Author

joaquimg commented Jul 12, 2020

Master:

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:      27.6s / 46.0%           5.79GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v       20    3.48s  27.4%   174ms   1.44GiB  25.0%  74.0MiB
   build       20    2.74s  21.6%   137ms   1.44GiB  24.9%  73.9MiB
   opt         20    674ms  5.31%  33.7ms   1.53MiB  0.03%  78.2KiB
 bcs           20    2.38s  18.8%   119ms   1.06GiB  18.2%  54.1MiB
   build       20    1.65s  13.0%  82.4ms   1.05GiB  18.2%  54.0MiB
   opt         20    676ms  5.32%  33.8ms   1.53MiB  0.03%  78.2KiB
 cs            20    2.36s  18.6%   118ms   1.06GiB  18.2%  54.1MiB
   build       20    1.65s  13.0%  82.4ms   1.05GiB  18.2%  54.0MiB
   opt         20    654ms  5.15%  32.7ms   1.53MiB  0.03%  78.2KiB
 bc + s        20    2.30s  18.1%   115ms   1.17GiB  20.1%  59.7MiB
   copy        20    1.46s  11.5%  73.1ms    781MiB  13.2%  39.0MiB
   opt         20    625ms  4.92%  31.3ms     0.00B  0.00%    0.00B
   build       20    160ms  1.26%  8.02ms    412MiB  6.95%  20.6MiB
 c + s         20    2.16s  17.0%   108ms   1.05GiB  18.2%  53.9MiB
   copy        20    1.33s  10.5%  66.5ms    666MiB  11.2%  33.3MiB
   opt         20    631ms  4.97%  31.6ms     0.00B  0.00%    0.00B
   build       20    145ms  1.15%  7.27ms    412MiB  6.95%  20.6MiB
 data           1   12.5ms  0.10%  12.5ms   15.8MiB  0.27%  15.8MiB
 ──────────────────────────────────────────────────────────────────

After last commit:

 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations
                   ──────────────────────   ───────────────────────
 Tot / % measured:      25.8s / 44.0%           4.98GiB / 100%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 bcs + v       20    3.45s  30.4%   173ms   1.44GiB  29.0%  74.0MiB
   build       20    2.73s  24.0%   136ms   1.44GiB  29.0%  73.9MiB
   opt         20    670ms  5.89%  33.5ms   1.53MiB  0.03%  78.2KiB
 cs            20    2.39s  21.0%   119ms   1.06GiB  21.2%  54.1MiB
   build       20    1.66s  14.6%  83.1ms   1.05GiB  21.2%  54.0MiB
   opt         20    665ms  5.86%  33.3ms   1.53MiB  0.03%  78.2KiB
 bcs           20    2.34s  20.6%   117ms   1.06GiB  21.2%  54.1MiB
   build       20    1.64s  14.4%  81.8ms   1.05GiB  21.2%  54.0MiB
   opt         20    644ms  5.67%  32.2ms   1.53MiB  0.03%  78.2KiB
 bc + s        20    1.79s  15.7%  89.3ms    754MiB  14.8%  37.7MiB
   copy        20    1.05s  9.25%  52.5ms    341MiB  6.70%  17.1MiB
   opt         20    519ms  4.57%  26.0ms     0.00B  0.00%    0.00B
   build       20    162ms  1.43%  8.10ms    412MiB  8.09%  20.6MiB
 c + s         20    1.38s  12.1%  68.8ms    684MiB  13.4%  34.2MiB
   copy        20    655ms  5.77%  32.8ms    272MiB  5.33%  13.6MiB
   opt         20    525ms  4.62%  26.3ms     0.00B  0.00%    0.00B
   build       20    142ms  1.25%  7.09ms    412MiB  8.09%  20.6MiB
 data           1   13.2ms  0.12%  13.2ms   15.8MiB  0.31%  15.8MiB
 ──────────────────────────────────────────────────────────────────

So we can double the speed in the c + s case, I will look into other cases and PR the DoubleDict to MOI

@joaquimg
Copy link
Member Author

depends on jump-dev/MathOptInterface.jl#1126

@odow odow merged commit f3a3d0e into master Sep 14, 2020
@odow odow deleted the jg/perf branch September 14, 2020 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants