Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The query is slow if cursor_fields contains the primary key #198

Open
cytim opened this issue Apr 30, 2024 · 1 comment
Open

The query is slow if cursor_fields contains the primary key #198

cytim opened this issue Apr 30, 2024 · 1 comment

Comments

@cytim
Copy link

cytim commented Apr 30, 2024

Problem

Hi there :) I found a problem that the pagination query is slow if cursor_fields contains the primary key.
Or more specifically, it is slow if cursor_fields hits an index which null values are excluded.

Investigation

Take the below query as an example.

from(
  u in User,
  order_by: [asc: :id]
)
|> Repo.paginate(
  cursor_fields: [id: :asc],
  after: "g3QAAAABdwJpZGIAAYag",
  limit: 1
)

It will generate the following SQL.

SELECT ...
  FROM "users" AS u0
  WHERE ((u0."id" > 100000) OR (u0."id" IS NULL))
  ORDER BY u0."id"
  LIMIT 10

This is the Query Plan for the SQL.

Limit  (cost=0.42..20.12 rows=10 width=2022) (actual time=1942.708..1942.741 rows=10 loops=1)
  ->  Index Scan using users_pkey on users u0  (cost=0.42..60184.00 rows=30541 width=2022) (actual time=1942.706..1942.738 rows=10 loops=1)
        Filter: ((id > 100000) OR (id IS NULL))
        Rows Removed by Filter: 89971
Planning Time: 0.599 ms
Execution Time: 1942.772 ms

From the Query Plan, the users_pkey index is used, but Filter is applied instead of Index Cond.

I customised the SQL by removing id IS NULL from it. Here's the new Query Plan.

Limit  (cost=0.42..12.14 rows=10 width=2022) (actual time=0.065..0.106 rows=10 loops=1)
  ->  Index Scan using users_pkey on users u0  (cost=0.42..35789.25 rows=30541 width=2022) (actual time=0.064..0.094 rows=10 loops=1)
        Index Cond: (id > 100000)
Planning Time: 0.588 ms
Execution Time: 0.129 ms

This time Index Cond is applied and the Execution Time improves by a lot.

I guess that the primary index does not index (and expect) any null values. Therefore, the null-checking will not hit the index condition and have to be done by filtering, which is slow.

Suggestion

One possible fix is to allow skipping the null-checking conditionally. For example, we can add a nullable? option to cursor_fields.

from(
  u in User,
  order_by: [asc: :id]
)
|> Repo.paginate(
  cursor_fields: [id: [order: :asc, nullable?: false]],
  after: "g3QAAAABdwJpZGIAAYag",
  limit: 1
)
@bwan-nan
Copy link

bwan-nan commented May 10, 2024

Hello!

I had the same issue and if like me you know you can't have null values in your cursor fields, try to use the option sort_direction: :asc_nulls_first to avoid making unnecessary queries. If this option is not passed, it defaults to :asc_nulls_last.

What you should try

from(
  u in User,
  order_by: [asc: :id]
)
|> Repo.paginate(
  cursor_fields: [:id],
  sort_direction: :asc_nulls_first,
  after: "g3QAAAABdwJpZGIAAYag",
  limit: 1
)

Difference on the same function between AscNullsLast and AscNullsFirst modules

defmodule Paginator.Ecto.Query.AscNullsLast do
  ...
  def build_dynamic_filter(args = %{direction: :after}) do
    dynamic(
      [{query, args.entity_position}],
      (^field_or_expr_equal(args) and ^args.next_filters) or
        ^field_or_expr_greater(args) or
        ^field_or_expr_is_nil(args)
    )
  end

  ...
end
  
defmodule Paginator.Ecto.Query.AscNullsFirst do
  ...
  def build_dynamic_filter(args = %{direction: :after}) do
    dynamic(
      [{query, args.entity_position}],
      (^field_or_expr_equal(args) and ^args.next_filters) or
        ^field_or_expr_greater(args)
    )
  end
  ...
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants