-
-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Posts List Performance + cursor-based pagination #3872
Conversation
Fixed the relevant issues. ready for review |
In theory, this PR should work with the legacy OFFSET based-pagination ( |
This comment was marked as abuse.
This comment was marked as abuse.
With my approach (as detailed above), this shouldn't be an issue. |
crates/db_views/src/post_view.rs
Outdated
@@ -185,6 +186,34 @@ fn queries<'a>() -> Queries< | |||
}; | |||
|
|||
let list = move |mut conn: DbConn<'a>, options: PostQuery<'a>| async move { | |||
macro_rules! order_and_page_filter_desc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've tried to do this without a macro and without copy-pasting this code 10 times, but I couldn't figure out the types for diesel if this was a lambda or function. I'm not sure if it would even be possible, even with BoxedQuery..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd almost rather this be copy-pasted around, if even some parts of it could be extracted into functions, rather than using macros. @dullbananas might be able to help with diesel function return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of query
should be inferrable if a closure is used.
The type of the column is tricky because columns with multiple different sql types are used. If only one sql type was needed, then BoxableExpression
could be used. And closures can't be generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using a closure, but even then you need to be able to name the return type and some arg types and I couldn't manage to figure that out..
Also I think it has to be generic which as you said isn't possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function like this could probably be used
fn order_and_page_filter_desc<Q, C, T>(
query: &mut Q,
column: C,
options: &PostQuery,
getter: impl Fn(&PostAggregates) -> T,
)
where
Q: ThenOrderDsl<dsl::Desc<C>, Output = Q> + FilterDsl<dsl::GtEq<C, T>, Output = Q> + FilterDsl<dsl::GtEq<C, T>, Output = Q>,
C: Expression,
C::SqlType: SingleValue + SqlType,
T: AsExpression<C::SqlType>,
Usage:
order_and_page_filter_desc(&mut query, post_aggregates::featured_local, &options, |a| a.featured_local);
The ascending version would use dsl::Asc
instead of dsl::Desc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*query = query.then_order_by(column.desc());
seems to not work because of cannot move out of
*querywhich is behind a mutable reference move occurs because
*queryhas type
Q, which does not implement the
Copy` trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the &mut
and make the function return the same type used for the query
parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This'll be a great addition, a few thoughts tho:
- Should this just work based on a simple post_id number, rather than easily reverse engineerable hash functions. See comment below.
- If cursor based pagination works as simply as
where id > cursor and (more filters) LIMIT X
, I don't see why specifying upper bounds are necessary. - Needs some tests to make sure pagination is working correctly.
- Should just be a breaking change, rather than supporting the old method.
crates/db_views/src/post_view.rs
Outdated
@@ -185,6 +186,34 @@ fn queries<'a>() -> Queries< | |||
}; | |||
|
|||
let list = move |mut conn: DbConn<'a>, options: PostQuery<'a>| async move { | |||
macro_rules! order_and_page_filter_desc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd almost rather this be copy-pasted around, if even some parts of it could be extracted into functions, rather than using macros. @dullbananas might be able to help with diesel function return types.
I'll test this with some production data now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its testing out fine, but the upper-bound stuff is really difficult to wrap my head around, as I'm sure it will be for anyone looking at this code.
We desperately need more DB experts to look at this to make sure that its the best way of doing this, but I won't let that hold up this PR; it'll have to come at a later date. I especially wish @johanndt could have a look, but they don't seem to be around.
I'd also really prefer that the filtering and sorting macro be removed, so @dullbananas if you could attempt that, before we merge, it'd be really helpful. Otherwise, I spose we have no choice.
|
||
CREATE INDEX idx_post_aggregates_community_hot ON post_aggregates (community_id, featured_local DESC, hot_rank DESC, published DESC); | ||
|
||
CREATE INDEX idx_post_aggregates_community_scaled ON post_aggregates (community_id, featured_local DESC, scaled_rank DESC, published DESC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for adding these ones too.
diesel migration redo
worked well also.
I've managed to remove the macro with the help of what dullbananas wrote above. |
c9b9259
to
f3afbb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well with local production data. The only minor issue I see is that the deprecated page doesn't work with subscribed, but that's NBD.
Is this good to merge? |
It is from my side |
I'll start to add all these new features into lemmy-ui in the next few days. |
This PR implements two things:
Cursor-based pagination for the /posts/list API request. A new opaque parameter page_v2 is added that fetches the next page based on the previous one.
The request now looks like
curl 'http://domain/api/v3/post/list?page_v2=Pf54e&limit=20&sort=Hot&...
The response now looks like this:
Cursor-based pagination is optimized so each page only fetches ~LIMIT number of posts from the database. This is not possible with OFFSET-based pagination. Right now this works alongside the
page=
parameter, but this pagination method should (imo) completely replace the existing OFFSET-based pagination (with thepage=
parameter being deprecated). There's many articles on the internet about why offset-based pagination is bad, notably this one referencing multiple posts from MariaDB, Slack, Shopify engineering. Right now it only supports forward-paging. It is possible to implement backwards pagination like this but I don't see a reason to right now.Performant ordering and limiting for the subscribed page (see Slow SQL queries #2877 (comment)).
This works as follows: First, fetch a page for a very active followed community. Then, use that to add a limit to the real suscribed fetch. More detail about why this works is in a code comment. This should hopefully remove all slowness and weird query plans from the subscribed post list query.
Current issues:
Off-by-one-error: page_after actually includes one post from the previous pagethe estimation query currently uses wrong filters (featured_community instead of featured_local)code is a bit uglyFuture work: