-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hasura's SQL is 3x slower than hand-written SQL #6448
Comments
I have also tried a GraphQL query more similar to the handwritten SQL query. The execution time doesn't differ much, but planning time is shorter. It still doesn't do a hash join: query MyQuery {
random_media(args: {n: 10}) {
filename
tags {
tag
}
}
} SELECT
coalesce(json_agg("root"), '[]') AS "root"
FROM
(
SELECT
row_to_json(
(
SELECT
"_6_e"
FROM
(
SELECT
"_1_root.base"."filename" AS "filename",
"_5_root.ar.root.tags"."tags" AS "tags"
) AS "_6_e"
)
) AS "root"
FROM
(
SELECT
*
FROM
"public"."random_media"(('10') :: integer) AS "_0_random_media"
WHERE
('true')
) AS "_1_root.base"
LEFT OUTER JOIN LATERAL (
SELECT
coalesce(json_agg("tags"), '[]') AS "tags"
FROM
(
SELECT
row_to_json(
(
SELECT
"_3_e"
FROM
(
SELECT
"_2_root.ar.root.tags.base"."tag" AS "tag"
) AS "_3_e"
)
) AS "tags"
FROM
(
SELECT
*
FROM
"public"."media_tags"
WHERE
(("_1_root.base"."id") = ("media"))
) AS "_2_root.ar.root.tags.base"
) AS "_4_root.ar.root.tags"
) AS "_5_root.ar.root.tags" ON ('true')
) AS "_7_root"
|
Hasura's SQL was scaling terribly on larger datasets (20ms vs 4ms hand-written). Postgresql seems to get confused and does a seqscan on media_tags instead of using an index. With However, after adding a hash index on media_tags.media, postgresql started using it instead of seqscan. The difference is only 2x now (0.2ms vs 0.1ms hand-written). CREATE INDEX ON media_tags USING HASH (media); Are there any plans on simplifying the queries that are sent to postgres? Maybe doing the serialization to JSON on graphql-engine's side? |
In my experience attempts to rewrite hasura SQL by hand almost always gave some results but not very significant. Change of index options (especially INCLUDE-ing fields) - got much significant results. Which is similar with what you achieved with HASH index. Another point of optimization is rewriting by hand object/array relations using knowledge about pagination. So generally I gave up on attempts rewriting But your experiments would be very intersting to see on biggger samples of data. |
@mlvzk For this query, doing an query MyQuery {
random_media(args: {n: 10}) {
filename
tags {
name
}
}
} Unless there is sample data for these tables, it is hard to see what plans postgres generates and what indexes should help. However, to answer some of the questions:
json aggregation on Postgres is much faster than anything that we can accomplish on the Haskell side (we have done a fair bit of testing on this front a while ago) so it is not fair to compare the following query against hasura generated query: SELECT filename, tag FROM random_media(10) AS m INNER JOIN media_tags ON media = m.id; At the very least, you should be comparing it with this query: explain analyze SELECT filename, tags FROM random_media(10) AS m left outer JOIN (select media, json_agg(mt.*) as tags from media_tags mt group by media) mt ON mt.media = m.id; But graphql-engine generates something along these lines SELECT filename, tags FROM random_media(10) AS m left outer JOIN lateral (select json_agg(mt.*) as tags from media_tags mt where mt.media = m.id) s ON true; to support queries such as this: query MyQuery {
random_media(args: {n: 10}) {
filename
tags(limit: 10 order_by: {asc: desc}) {
name
}
}
} Typically graphql-engine emits a fairly well constructed query that can be optimized by Postgres by adding the right indexes. There is always more scope for some subtle optimizations and we hope to attend to some of these in the nearby future. |
I too am having trouble with Postgres choosing a nested loop join instead of a hash join with Hasura's generated SQL. A foreign table is involved, and the loops mean that we're incurring network calls out to the foreign server for every row: 7 seconds total to make 164 calls of ~43ms each. (Removing the foreign table from the query does not avoid the outermost nested loop join)
Table structure:
A hand-written query without the lateral join uses hash joins and only makes a single call to the foreign server, with good performance:
Adding the lateral join, as the hasura-generated query does, switches to the nested loop join and makes the repeated calls out to the foreign server:
|
Hello. I've noticed that the SQL Hasura generates is a lot slower than what it could be. The main difference is that it's using Nested Loop Left Join instead of Hash Join (see EXPLAIN ANALYZE below). This is on a very small dataset (<200 rows), I assume the performance difference would be even bigger on a larger dataset.
Here's some data:
Tables + function SQL
Function:
Tables:
Hasura
GraphQL query:
SQL:
EXPLAIN ANALYZE:
Hand-written SQL:
EXPLAIN ANALYZE:
The text was updated successfully, but these errors were encountered: