Yet another vector rewrite #53

JoeyBF · 2021-11-26T08:44:25Z

Here's a reworking of how we handle vectors, to set the stage for matrices later. I abstracted away the vector-like behavior that FpVector, Slice and SliceMut shared and I put it in traits, BaseVector for immutable methods and BaseVectorMut for mutable ones. That lets us introduce some more vector-like structs down the line, like Row for a row of a matrix or ArchivedFpVector if we start using rkyv. I'm opening this as a draft because:

The ext library works but the rest of the repo has to be fixed still (mostly just importing those two traits).
It's still missing almost all the documentation.
There's a non-trivial performance hit when using this new module, and I would like to find why and fix it if possible.

dalcde · 2021-11-26T11:26:12Z

Do you have concrete numbers RE the performance hit?

JoeyBF · 2021-11-26T20:42:22Z

The ext library works

Actually I take that back. The ext library compiles, and all fp and algebra tests pass, but the resolution starts getting messed up at stem 19. So much so that the computation up to t = 40 already basically takes forever. The fp bench results are sane though, I can still share those if you want.

dalcde · 2021-11-26T21:19:34Z

By "messed up" do you mean you get wrong results? In my experience that tends to make later calculations compute something ridiculous and take forever

JoeyBF · 2021-11-26T21:41:37Z

Yeah, the rank of each bidegree explodes. Here's a sample output

Module (default: S_2):
Load from save? (optional):
Max s (default: 15):
Max t (default: 30):
Save file (optional):
·
·
·
·
·                                     :
·                                   · : ·
·                                 ·   : · ⠿
·                             ·   ·   · · ⠿ *
·                     ·       · · ·   · · ⠿ * *
·                   · ·     · · · ·   · : ⡿ * * *
·                 ·   ·     · :   · · · ∴ ⣿ * * * *
·             ·   ·         · ·   · : · ∴ ⡿ * * * * *
·     ·       · · ·         · ·   · · · · ⁙ ⁘ ⣿ * * * *
·   · ·     · · ·           · · ·   ·         : : : : * *
· ·   ·       ·               ·
·

JoeyBF · 2021-11-27T04:39:01Z

Update: I tracked the error down to the debug_assert at matrix_inner.rs:747. The first time the inside of the for loop is executed is at t = 24, s = 3, and that's where the assert fails and our problems begin. Now I just need to check why it fails, since my reimplementation should be semantically equivalent to the current one.

Edit: Nevermind that, my debugger was acting weird. (t,s) = (24,3) is not the first bidegree where the loop is executed, but it is the bidegree where the assert is violated. The column 28 is a pivot but shouldn't be.

Edit 2: I tracked down the bug to bidegree t = 21, s = 3. The new implementation has a few bits flipped in rows 46 and 48 of the matrix in step_resolution, and the error propagates to the later bidegrees. Now I just need to figure out why those bits aren't set properly. This is starting to sound like something goes wrong in the algebra subcrate, which I'm the least familiar with.

ext/crates/fp/src/limb.rs

ext/crates/fp/src/vector/generic.rs

dalcde · 2021-11-29T13:36:28Z

Here is a thought --- how different is BaseVector from Into<Slice> in the old world?

dalcde · 2021-12-01T01:14:41Z

The correct trait bounds are really for<'a> &'a T: Into<Slice<'a>> and for<'a> &'a mut T: Into<SliceMut<'a>>. I used them here:

sseq/ext/crates/fp/src/matrix/quasi_inverse.rs

Lines 75 to 83 in 874fb2d

    
               pub fn stream_quasi_inverse<T, S>( 
        
                   p: ValidPrime, 
        
                   data: &mut impl Read, 
        
                   results: &mut [T], 
        
                   inputs: &[S], 
        
               ) -> std::io::Result<()> 
        
               where 
        
                   for<'a> &'a mut T: Into<SliceMut<'a>>, 
        
                   for<'a> &'a S: Into<Slice<'a>>,

with implementations

sseq/ext/crates/fp/src/vector.rs

Lines 427 to 455 in 874fb2d

    
           impl<'a, 'b> From<&'a mut SliceMut<'b>> for SliceMut<'a> { 
        
               fn from(slice: &'a mut SliceMut<'b>) -> SliceMut<'a> { 
        
                   slice.copy() 
        
               } 
        
           } 
        
           impl<'a, 'b> From<&'a Slice<'b>> for Slice<'a> { 
        
               fn from(slice: &'a Slice<'b>) -> Slice<'a> { 
        
                   *slice 
        
               } 
        
           } 
        
           impl<'a, 'b> From<&'a SliceMut<'b>> for Slice<'a> { 
        
               fn from(slice: &'a SliceMut<'b>) -> Slice<'a> { 
        
                   slice.as_slice() 
        
               } 
        
           } 
        
           impl<'a> From<&'a FpVector> for Slice<'a> { 
        
               fn from(v: &'a FpVector) -> Slice<'a> { 
        
                   v.as_slice() 
        
               } 
        
           } 
        
           impl<'a> From<&'a mut FpVector> for SliceMut<'a> { 
        
               fn from(v: &'a mut FpVector) -> SliceMut<'a> { 
        
                   v.as_slice_mut() 
        
               } 
        
           }

JoeyBF · 2021-12-05T11:15:16Z

Here is a thought --- how different is BaseVector from Into<Slice> in the old world?

Ideally we would want something like impl<T: BaseVector> From<T> for Slice but that would imply a conflicting implementation of From<Slice> on Slice. I still think it's important to distinguish the two, at least semantically. For example, this allows FpVector to have much more optimized implementations of several methods.

dalcde · 2021-12-05T11:54:58Z

On Sun, Dec 05, 2021 at 03:08:41AM -0800, Joey Beauvais-Feisthauer wrote: That's a good point. I was ~~stealing~~ using the code for the `rulinalg` crate, and it uses raw pointers. I'll try to replace them by slices and see if it goes wrong. I'm thinking Rust might complain that we can't store a reference to a slice that we just created ourselves. I think storing a slice directly makes the struct a DST, which is awful. Maybe a `Box<[Limb]>`?

Storing [Limb] is what makes it a DST. A &[Limb] is fine because it is a pointer to a DST, and has a known size (namely 2 * 64 bits).

dalcde · 2021-12-05T12:07:35Z

Why would there be a conflict? I would be surprised if the FpVector improvement would be significant, especially if the matrix slices are going to be slices anyway. (otoh, an AlignedSlice struct would be nice and good for semantics). For maintenance puprposes, I think there is definitely a benifit to having a smaller code footprint if possible.

JoeyBF · 2021-12-06T06:36:11Z

Storing [Limb] is what makes it a DST. A &[Limb] is fine because it is a pointer to a DST, and has a known size (namely 2 * 64 bits).

Of course. I was thinking that std::slice::from_raw_parts returned a raw slice instead of a reference; since it's just a reference, there should be no issue storing it.

Why would there be a conflict? I would be surprised if the FpVector improvement would be significant, especially if the matrix slices are going to be slices anyway. (otoh, an AlignedSlice struct would be nice and good for semantics). For maintenance puprposes, I think there is definitely a benifit to having a smaller code footprint if possible.

Any T implements From<T>, but since Slice also implements BaseVector that would give us a second implementation of From<Slice> on Slice. I haven't tried yanking out the special FpVector methods but my intuition would tell me that they give a decent speedup.

What did you have in mind for an AlignedSlice struct? Something like a struct that is guaranteed to begin/end on a limb boundary? Logically that would be what is returned by some split_borrow method on FpVector.

dalcde · 2021-12-06T14:58:57Z

On Sun, Dec 05, 2021 at 10:36:22PM -0800, Joey Beauvais-Feisthauer wrote: > Why would there be a conflict? I would be surprised if the FpVector improvement would be significant, especially if the matrix slices are going to be slices anyway. (otoh, an AlignedSlice struct would be nice and good for semantics). For maintenance puprposes, I think there is definitely a benifit to having a smaller code footprint if possible. Any `T` implements `From<T>`, but since `Slice` also implements `BaseVector` that would give us a second implementation of `From<Slice>` on `Slice`. I haven't tried yanking out the special `FpVector` methods but my intuition would tell me that they give a decent speedup.

I was thinking about *defining* `BaseVector` to mean `Into<Slice>`. Alternatively, `BaseVector` can auto-implement all other methods based on the `as_slice` method, which leaves room for more optimized versions.

What did you have in mind for an `AlignedSlice` struct? Something like a struct that is guaranteed to begin/end on a limb boundary? Logically that would be what is returned by some `split_borrow` method on `FpVector`.

Yeah

dalcde · 2022-03-16T03:08:39Z

Just as a note, after #80 we cannot make e.g. Module::act take in impl BaseVector since it would no longer be object safe.

JoeyBF · 2022-05-31T05:47:13Z

All tests pass except the selenium ones, and they don't look like the same flakey ones we used to have. I'm not sure how to fix them, since at first glance they don't seem related to fp

dalcde · 2022-05-31T06:51:40Z

empirically, does the web interface work?

JoeyBF · 2022-05-31T07:06:30Z

I'm not sure how I should check that locally. How do I start the webserver?

dalcde · 2022-05-31T15:36:35Z

cargo run in the directory should do the job

JoeyBF · 2022-06-01T03:59:57Z

It seems adding a differential called a code path in fp that wasn't tested, but it's fixed now

dalcde · 2022-06-19T03:01:05Z

Do you mind rebasing and resolving merge conflicts first?

ext/crates/fp/src/limb.rs

ext/crates/fp/src/vector/internal/mod.rs

JoeyBF · 2022-06-26T04:04:00Z

How does this rewrite affect performance?

I'm still running benchmarks but I'll comment on that shortly

JoeyBF · 2022-06-26T08:41:14Z

Here are the benchmarks, compared using critcmp

group                             master                                 new_vector
-----                             ------                                 ----------
row_reduce_2/row_reduce_2_10      1.00   565.5±27.64ns        ? ?/sec    1.11   627.9±33.43ns        ? ?/sec
row_reduce_2/row_reduce_2_100     1.00     15.9±0.67µs        ? ?/sec    1.03     16.3±1.35µs        ? ?/sec
row_reduce_2/row_reduce_2_1000    1.00  1163.2±31.80µs        ? ?/sec    1.05  1224.1±76.33µs        ? ?/sec
row_reduce_2/row_reduce_2_20      1.00  1334.2±45.95ns        ? ?/sec    1.26  1674.6±272.55ns        ? ?/sec
row_reduce_2/row_reduce_2_2000    1.00      5.0±0.13ms        ? ?/sec    1.05      5.2±0.13ms        ? ?/sec
row_reduce_2/row_reduce_2_4000    1.00     29.5±0.93ms        ? ?/sec    1.02     29.9±1.57ms        ? ?/sec
row_reduce_2/row_reduce_2_420     1.00    205.2±5.12µs        ? ?/sec    1.17   241.1±47.42µs        ? ?/sec
row_reduce_2/row_reduce_2_69      1.00      8.6±0.23µs        ? ?/sec    1.09      9.3±1.15µs        ? ?/sec
row_reduce_3/row_reduce_3_10      1.00   971.7±41.19ns        ? ?/sec    1.35  1313.6±52.65ns        ? ?/sec
row_reduce_3/row_reduce_3_100     1.00    110.5±3.71µs        ? ?/sec    1.38    152.5±1.55µs        ? ?/sec
row_reduce_3/row_reduce_3_1000    1.13     35.2±0.93ms        ? ?/sec    1.00     31.1±1.02ms        ? ?/sec
row_reduce_3/row_reduce_3_20      1.00      3.5±0.10µs        ? ?/sec    1.46      5.1±0.18µs        ? ?/sec
row_reduce_3/row_reduce_3_420     1.00      3.4±0.07ms        ? ?/sec    1.12      3.8±0.43ms        ? ?/sec
row_reduce_3/row_reduce_3_69      1.00     48.1±1.12µs        ? ?/sec    1.45     69.8±1.76µs        ? ?/sec
row_reduce_5/row_reduce_5_10      1.00   990.6±26.66ns        ? ?/sec    1.22  1213.4±61.51ns        ? ?/sec
row_reduce_5/row_reduce_5_100     1.00    153.9±3.17µs        ? ?/sec    1.15   177.2±24.86µs        ? ?/sec
row_reduce_5/row_reduce_5_1000    1.30     65.6±2.12ms        ? ?/sec    1.00     50.6±1.57ms        ? ?/sec
row_reduce_5/row_reduce_5_20      1.00      3.8±0.04µs        ? ?/sec    1.32      5.0±0.20µs        ? ?/sec
row_reduce_5/row_reduce_5_420     1.18      6.2±0.05ms        ? ?/sec    1.00      5.3±0.38ms        ? ?/sec
row_reduce_5/row_reduce_5_69      1.00     59.9±1.93µs        ? ?/sec    1.22     72.8±2.89µs        ? ?/sec
row_reduce_7/row_reduce_7_10      1.00  1810.6±76.17ns        ? ?/sec    1.19      2.2±0.05µs        ? ?/sec
row_reduce_7/row_reduce_7_100     1.00   486.1±49.60µs        ? ?/sec    1.12   542.8±70.90µs        ? ?/sec
row_reduce_7/row_reduce_7_1000    1.07   318.7±10.00ms        ? ?/sec    1.00   296.8±14.54ms        ? ?/sec
row_reduce_7/row_reduce_7_20      1.00      9.1±0.15µs        ? ?/sec    1.23     11.2±0.16µs        ? ?/sec
row_reduce_7/row_reduce_7_420     1.06     25.4±1.29ms        ? ?/sec    1.00     23.9±0.58ms        ? ?/sec
row_reduce_7/row_reduce_7_69      1.00    177.7±4.62µs        ? ?/sec    1.13    201.0±6.65µs        ? ?/sec

It looks like it tends to be better for bigger matrices, but worse otherwise. Also not great for p = 2. I was expecting something roughly like this, once new matrices will be in place the performance should go back up with interest. I didn't bench all of ext however.

dalcde · 2022-06-26T16:12:50Z

Do you know/have an idea where the new overhead comes from?

JoeyBF · 2022-06-26T20:07:26Z

I'm trying to run iai to figure it out but cachegrind keeps crashing. Could you confirm that you also get Failed to run benchmark in cachegrind. Exit code: signal: 4 when running cargo bench --bench iai in the ext folder on the master branch?

JoeyBF · 2022-06-26T21:08:59Z

Actually, even compiling cargo build --release --example resolve and trying valgrind --tool=cachegrind target/release/examples/resolve gives me an illegal instruction error

JoeyBF · 2022-06-28T08:16:10Z

I figured it was a recent instruction set that was tripping up valgrind so I ran iai with target CPU x86-64. I'm having trouble reading the results, but it seems that cache locality is a bit worse and there are more instructions in general when the dimensions are small, but fewer than master when the dimensions get bigger. Here's the comparison; master on the left, this PR on the right. That's probably all I can say without diving into the asm

dalcde · 2022-07-03T07:53:10Z

I don't think that's the right direction to do concurrency in. Have you benchmarked how long each write_qi takes? My guess would be that each call is very fast, but we call it many many times.

JoeyBF · 2022-07-03T21:27:44Z

What would you mean by very fast? Looking at top I would guess something like 30s-1m maybe. That's also a big part of why I think #101 would be useful

`BTreeSet` is already allocated on the heap, so it shouldn't make a difference.

JoeyBF force-pushed the new_vector branch 3 times, most recently from c1930a9 to 373e919 Compare November 28, 2021 08:53

dalcde reviewed Nov 28, 2021

View reviewed changes

ext/crates/fp/src/limb.rs Outdated Show resolved Hide resolved

ext/crates/fp/src/vector/generic.rs Outdated Show resolved Hide resolved

JoeyBF force-pushed the new_vector branch from 373e919 to b54141a Compare November 30, 2021 16:15

JoeyBF mentioned this pull request Dec 5, 2021

Idea: FpVectorP<2> with concurrent write access #55

Closed

JoeyBF force-pushed the new_vector branch from b54141a to c1783f7 Compare May 20, 2022 05:38

JoeyBF marked this pull request as ready for review May 31, 2022 05:45

JoeyBF requested a review from dalcde June 17, 2022 04:58

JoeyBF mentioned this pull request Jun 17, 2022

Use databases for saving #99

Open

JoeyBF force-pushed the new_vector branch from d1408c1 to 65d570c Compare June 19, 2022 03:49

dalcde reviewed Jun 25, 2022

View reviewed changes

ext/crates/fp/src/limb.rs Outdated Show resolved Hide resolved

dalcde reviewed Jun 26, 2022

View reviewed changes

ext/crates/fp/src/vector/internal/mod.rs Outdated Show resolved Hide resolved

JoeyBF mentioned this pull request Jun 29, 2022

Yet another matrix rewrite #100

Draft

4 tasks

JoeyBF added 12 commits July 4, 2022 00:50

Vector rewrite

a61f4c4

Fix prime.rs tests

46932ee

Rework limb module

d4a958d

Correct misplaced docs

4bbcd39

Reorder mod

5fa420a

Fix LimbLength docs

e6143f9

Replace leading underscore by #[allow(dead_code)]

b28f5a1

Rework is_zero and add tests

16b0088

Fixup documentation

7ac4700

Implement add_truncate for FpVector

5039b4c

Make benchmarks work without odd-primes

97fdd77

Add SIMD to add_shift_none

84965b5

JoeyBF force-pushed the new_vector branch from bc22ca9 to 84965b5 Compare July 4, 2022 04:51

JoeyBF added 5 commits July 4, 2022 21:57

More opinionated rustfmt

9196f10

Fix critical performance regression

1da48f6

Fix clippy::box-collection

746266b

`BTreeSet` is already allocated on the heap, so it shouldn't make a difference.

Generate macros using build script

cbe3804

Remove hardcoding of max prime = 7

b00576a

JoeyBF force-pushed the new_vector branch from 74d5744 to b00576a Compare March 29, 2023 20:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yet another vector rewrite #53

Yet another vector rewrite #53

JoeyBF commented Nov 26, 2021

dalcde commented Nov 26, 2021 via email

JoeyBF commented Nov 26, 2021

dalcde commented Nov 26, 2021 via email

JoeyBF commented Nov 26, 2021

JoeyBF commented Nov 27, 2021 •

edited

Loading

dalcde commented Nov 29, 2021

dalcde commented Dec 1, 2021

JoeyBF commented Dec 5, 2021

dalcde commented Dec 5, 2021 via email

dalcde commented Dec 5, 2021 via email

JoeyBF commented Dec 6, 2021

dalcde commented Dec 6, 2021 via email

dalcde commented Mar 16, 2022

JoeyBF commented May 31, 2022

dalcde commented May 31, 2022 via email

JoeyBF commented May 31, 2022

dalcde commented May 31, 2022 via email

JoeyBF commented Jun 1, 2022

dalcde commented Jun 19, 2022

JoeyBF commented Jun 26, 2022

JoeyBF commented Jun 26, 2022

dalcde commented Jun 26, 2022 via email

JoeyBF commented Jun 26, 2022

JoeyBF commented Jun 26, 2022

JoeyBF commented Jun 28, 2022

dalcde commented Jul 3, 2022 via email

JoeyBF commented Jul 3, 2022

Yet another vector rewrite #53

Are you sure you want to change the base?

Yet another vector rewrite #53

Conversation

JoeyBF commented Nov 26, 2021

dalcde commented Nov 26, 2021 via email

JoeyBF commented Nov 26, 2021

dalcde commented Nov 26, 2021 via email

JoeyBF commented Nov 26, 2021

JoeyBF commented Nov 27, 2021 • edited Loading

dalcde commented Nov 29, 2021

dalcde commented Dec 1, 2021

JoeyBF commented Dec 5, 2021

dalcde commented Dec 5, 2021 via email

dalcde commented Dec 5, 2021 via email

JoeyBF commented Dec 6, 2021

dalcde commented Dec 6, 2021 via email

dalcde commented Mar 16, 2022

JoeyBF commented May 31, 2022

dalcde commented May 31, 2022 via email

JoeyBF commented May 31, 2022

dalcde commented May 31, 2022 via email

JoeyBF commented Jun 1, 2022

dalcde commented Jun 19, 2022

JoeyBF commented Jun 26, 2022

JoeyBF commented Jun 26, 2022

dalcde commented Jun 26, 2022 via email

JoeyBF commented Jun 26, 2022

JoeyBF commented Jun 26, 2022

JoeyBF commented Jun 28, 2022

dalcde commented Jul 3, 2022 via email

JoeyBF commented Jul 3, 2022

JoeyBF commented Nov 27, 2021 •

edited

Loading