Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query performance issues on Observations with plugins_coreModel_idType LONG #1923

Open
DanielBertocci opened this issue May 7, 2024 · 3 comments

Comments

@DanielBertocci
Copy link

DanielBertocci commented May 7, 2024

Issue Description

I use plugins_coreModel_idType set to LONG for the system and FROST correctly creates tables with IDs bigint.
I recently faced a performance issue on query observations, the system turned to be really slow, it could be one of the new updates of FROST but I am not sure.

What causes the issue

I have enabled query logging and I identified the query causing the issue.
When observations are queried with the query Observations?$filter=Datastream/@iot.id eq '1' FROST generates the following query:

select "e0"."RESULT_BOOLEAN", "e0"."RESULT_QUALITY", "e0"."PHENOMENON_TIME_START", "e0"."PARAMETERS", "e0"."DATASTREAM_ID", "e0"."RESULT_STRING", "e0"."RESULT_TYPE", "e0"."VALID_TIME_END", "e0"."PHENOMENON_TIME_END", "e0"."FEATURE_ID", "e0"."ID", "e0"."RESULT_JSON", "e0"."RESULT_TIME", "e0"."RESULT_NUMBER", "e0"."VALID_TIME_START"
from "OBSERVATIONS" as "e0"
left outer join "DATASTREAMS" as "e1" on "e1"."ID" = "e0"."DATASTREAM_ID" where cast("e0"."DATASTREAM_ID" as varchar) = $1
order by "e0"."PHENOMENON_TIME_START" asc, "e0"."PHENOMENON_TIME_END" asc, "e0"."ID" asc offset $2 rows fetch next $3 rows only

The key of the issue is

cast("e0"."DATASTREAM_ID" as varchar) = $1

Using plugins_coreModel_idType set to LONG, DATASTREAM_ID is a bigint and this cast messes up with the indexing and partitioning in the table, I imagine it cannot perform a proper comparison because of the different type. If I remove that cast, the query becomes more than 20 times faster (4 seconds -> 0.2 seconds)

Expected behavior:

What I would except is that this cast is dependent on the setting of idType, and in my specific case the cast to bigint.

How to reproduce

I have used the docker compose on FROST scripts with plugins_coreModel_idType set to LONG.

docker-compose.yaml
  version: "3"
  services:
    web:
      image: fraunhoferiosb/frost-server:latest
      environment:
        - serviceRootUrl=http://localhost:8080/FROST-Server
        - plugins_multiDatastream.enable=false
        - http_cors_enable=true
        - http_cors_allowed_origins=*
        - persistence_db_driver=org.postgresql.Driver
        - persistence_db_url=jdbc:postgresql://database:5432/sensorthings
        - persistence_db_username=sensorthings
        - persistence_db_password=ChangeMe
        - persistence_autoUpdateDatabase=true
        - plugins_coreModel_idType=LONG
      ports:
        - 8080:8080
        - 1883:1883
      depends_on:
        database:
          condition: service_healthy
    database:
      image: postgis/postgis:14-3.2-alpine
      environment:
        - POSTGRES_DB=sensorthings
        - POSTGRES_USER=sensorthings
        - POSTGRES_PASSWORD=ChangeMe
      command: ["postgres", "-c", "log_statement=all"]
      volumes:
        - postgis_volume:/var/lib/postgresql/data
      healthcheck:
        test: ["CMD-SHELL", "pg_isready -d sensorthings -U sensorthings "]
        interval: 10s
        timeout: 5s
        retries: 5
  volumes:
    postgis_volume:

Additional notes

I see also a JOIN that is not needed, only with $expand and accessing nested fields but it is not the case for the proposed query, I wonder if there is a way to optimize that too.

@hylkevds
Copy link
Member

hylkevds commented May 7, 2024

Yes, if you do @iot.id eq '1' you explicitly tell FROST to compare the id with a string, so it has to cast the id to a string.
You should use @iot.id eq 1

Or, even simpler, just GET v1.1/Datastreams(1)/Observations

I have done some experiments with that join, and it makes no significant difference in recent versions of PostgreSQL. Though recent versions of FROST will try to optimise it out.

@DanielBertocci
Copy link
Author

First of all, thank you because you gave me an immediate way to solve the issue we had.

This issue was really hard to find out, it needed some steps to realize that with that query, we didn't take advantage of the indexing.

I wonder if FROST could take special care about IDs, especially with plugins_coreModel_idType set: it knows what is the expected type for the IDs and I would expect the cast of the query parameter to the idType more than the current behavior.
A more explicit behavior could be to throw an error, claiming that it is not possible to use a string value when the id is bigint.

I see a risk that, for a matter of quotes in OGC STA query, FROST doesn't take of the advantage of the DB optimizations.

What do you think?

@hylkevds
Copy link
Member

hylkevds commented May 7, 2024

This problem is not limited to ID fields, it can happen any time a comparison is made between different types. One of them will have to be cast to the other, and since the only cast that is guaranteed to succeed is the cast to string, that's what FROST does. The same thing happens when filtering on one of the times, and quoting the time-constant in the URL.

The fact that in this specific case that means an index can't be used is something FROST doesn't know, since FROST doesn't know which fields have indices.

In theory, since the "other" field is a constant, FROST could try to see if that string constant could be converted to the non-string type, before taking the route that is guaranteed to work. That may be worth looking into...
But this will lead to confusing behaviour when comparing result values to a quoted number, since result has a string-representation, so no casting is needed. Thus comparing IDs to a falsely quoted number (using gt or lt) would result in the behaviour that the user expects, but comparing results to a falsely quoted number would not... For IDs '2' would be smaller than 11, but for results '2' would be larger than 11...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants