-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yet another vector rewrite #53
base: master
Are you sure you want to change the base?
Conversation
Do you have concrete numbers RE the performance hit?
|
Actually I take that back. The ext library compiles, and all |
By "messed up" do you mean you get wrong results? In my experience that tends to make later calculations compute something ridiculous and take forever
|
Yeah, the rank of each bidegree explodes. Here's a sample output
|
Update: I tracked the error down to the debug_assert at matrix_inner.rs:747. The first time the inside of the for loop is executed is at t = 24, s = 3, and that's where the assert fails and our problems begin. Now I just need to check why it fails, since my reimplementation should be semantically equivalent to the current one. Edit: Nevermind that, my debugger was acting weird. (t,s) = (24,3) is not the first bidegree where the loop is executed, but it is the bidegree where the assert is violated. The column 28 is a pivot but shouldn't be. Edit 2: I tracked down the bug to bidegree t = 21, s = 3. The new implementation has a few bits flipped in rows 46 and 48 of the matrix in |
c1930a9
to
373e919
Compare
Here is a thought --- how different is BaseVector from |
The correct trait bounds are really sseq/ext/crates/fp/src/matrix/quasi_inverse.rs Lines 75 to 83 in 874fb2d
with implementations sseq/ext/crates/fp/src/vector.rs Lines 427 to 455 in 874fb2d
|
Ideally we would want something like |
On Sun, Dec 05, 2021 at 03:08:41AM -0800, Joey Beauvais-Feisthauer wrote:
That's a good point. I was ~~stealing~~ using the code for the `rulinalg` crate, and it uses raw pointers. I'll try to replace them by slices and see if it goes wrong. I'm thinking Rust might complain that we can't store a reference to a slice that we just created ourselves. I think storing a slice directly makes the struct a DST, which is awful. Maybe a `Box<[Limb]>`?
Storing [Limb] is what makes it a DST. A &[Limb] is fine because it is a
pointer to a DST, and has a known size (namely 2 * 64 bits).
|
Why would there be a conflict? I would be surprised if the FpVector
improvement would be significant, especially if the matrix slices are
going to be slices anyway. (otoh, an AlignedSlice struct would be nice
and good for semantics).
For maintenance puprposes, I think there is definitely a benifit to
having a smaller code footprint if possible.
|
Of course. I was thinking that
Any What did you have in mind for an |
On Sun, Dec 05, 2021 at 10:36:22PM -0800, Joey Beauvais-Feisthauer wrote:
> Why would there be a conflict? I would be surprised if the FpVector improvement would be significant, especially if the matrix slices are going to be slices anyway. (otoh, an AlignedSlice struct would be nice and good for semantics). For maintenance puprposes, I think there is definitely a benifit to having a smaller code footprint if possible.
Any `T` implements `From<T>`, but since `Slice` also implements `BaseVector` that would give us a second implementation of `From<Slice>` on `Slice`. I haven't tried yanking out the special `FpVector` methods but my intuition would tell me that they give a decent speedup.
I was thinking about *defining* `BaseVector` to mean `Into<Slice>`.
Alternatively, `BaseVector` can auto-implement all other methods based
on the `as_slice` method, which leaves room for more optimized versions.
What did you have in mind for an `AlignedSlice` struct? Something like a struct that is guaranteed to begin/end on a limb boundary? Logically that would be what is returned by some `split_borrow` method on `FpVector`.
Yeah
|
Just as a note, after #80 we cannot make e.g. Module::act take in |
All tests pass except the selenium ones, and they don't look like the same flakey ones we used to have. I'm not sure how to fix them, since at first glance they don't seem related to |
empirically, does the web interface work?
|
I'm not sure how I should check that locally. How do I start the webserver? |
cargo run in the directory should do the job
|
It seems adding a differential called a code path in fp that wasn't tested, but it's fixed now |
Do you mind rebasing and resolving merge conflicts first? |
I'm still running benchmarks but I'll comment on that shortly |
Here are the benchmarks, compared using critcmp
It looks like it tends to be better for bigger matrices, but worse otherwise. Also not great for p = 2. I was expecting something roughly like this, once new matrices will be in place the performance should go back up with interest. I didn't bench all of ext however. |
Do you know/have an idea where the new overhead comes from?
|
I'm trying to run iai to figure it out but cachegrind keeps crashing. Could you confirm that you also get |
Actually, even compiling |
I figured it was a recent instruction set that was tripping up valgrind so I ran iai with target CPU x86-64. I'm having trouble reading the results, but it seems that cache locality is a bit worse and there are more instructions in general when the dimensions are small, but fewer than master when the dimensions get bigger. Here's the comparison; master on the left, this PR on the right. That's probably all I can say without diving into the asm |
I don't think that's the right direction to do concurrency in. Have you benchmarked how long each write_qi takes? My guess would be that each call is very fast, but we call it many many times.
|
What would you mean by very fast? Looking at top I would guess something like 30s-1m maybe. That's also a big part of why I think #101 would be useful |
`BTreeSet` is already allocated on the heap, so it shouldn't make a difference.
Here's a reworking of how we handle vectors, to set the stage for matrices later. I abstracted away the vector-like behavior that
FpVector
,Slice
andSliceMut
shared and I put it in traits,BaseVector
for immutable methods andBaseVectorMut
for mutable ones. That lets us introduce some more vector-like structs down the line, likeRow
for a row of a matrix orArchivedFpVector
if we start using rkyv. I'm opening this as a draft because: