Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON Support to V2 Engine (#217) #1464

Closed

Conversation

GumpacG
Copy link
Collaborator

@GumpacG GumpacG commented Mar 23, 2023

Description

Phase 1 of adding JSON Support to V2 Engine
For example: The following request was handled by V2 engine:

curl --location 'http://localhost:9200/_plugins/_sql?format=json' --header 'Content-Type: application/json' --data '{ 
    "query": "SELECT * FROM calcs LIMIT 1;"
}'
{
    "took": 13,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 17,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "calcs",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "int0": 1,
                    "bool3": true,
                    "time1": "19:36:22",
                    "bool2": false,
                    "int2": 5,
                    "int1": -3,
                    "str3": "e",
                    "int3": 8,
                    "str1": "CLAMP ON LAMPS",
                    "str2": "one",
                    "time0": "1899-12-30T21:07:32Z",
                    "num1": 8.42,
                    "datetime0": "2004-07-09T10:17:35Z",
                    "num0": 12.3,
                    "datetime1": null,
                    "num4": null,
                    "key": "key00",
                    "num3": -11.52,
                    "bool1": true,
                    "num2": 17.86,
                    "bool0": true,
                    "str0": "FURNITURE",
                    "date3": "1986-03-20",
                    "date2": "1977-04-20",
                    "date1": "2004-04-01",
                    "date0": "2004-04-15",
                    "zzz": "a"
                }
            }
        ]
    }
}

Returning the raw response from OpenSearch made IT tests fail due to missing in-memory operations, difference in aggregation type (V2 is composite_bucket), and some bugs were discovered. The solution for these failing IT tests were to fall back to legacy for all functions, expect failures for IT tests with a comment for issues related.

List of expected failures in legacy IT:

  • DateFormatIT.and (1436)

  • DateFormatIT.equalTo (1436)

  • DateFormatIT.greaterThan (1436)

  • DateFormatIT.greaterThanOrEqualTo (1436)

  • DateFormatIT.lessThan (1436)

  • DateFormatIT.lessThanOrEqualTo (1436)

  • DateFormatIT.or (1436)

  • DateFormatIT.sortByDateFormat (1436)

  • QueryFunctionsIT.wildcardQuery (Dependent on PR #1314)

  • QueryIT.selectAllWithFieldAndOrderBy (785)

  • QueryIT.selectAllWithFieldReturnsAll (785)

  • QueryIT.selectAllWithFieldReverseOrder (785)

  • QueryIT.selectAllWithMultipleFields (785)

  • QueryIT.notLikeTests (1425)

Proposal for phase 2 of JSON support for V2

#1450

Issues Resolved

#1317

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

* Implement json support for V2 engine

Signed-off-by: Margarit Hakobyan <[email protected]>

* Reverted some changes

Signed-off-by: Margarit Hakobyan <[email protected]>

* Removed some fields

Signed-off-by: Margarit Hakobyan <[email protected]>

* minor fix

Signed-off-by: Margarit Hakobyan <[email protected]>

* Added a unit test, cleaned up

Signed-off-by: Margarit Hakobyan <[email protected]>

* Returning raw OpenSearch response when type is json

Signed-off-by: Margarit Hakobyan <[email protected]>

* Add an integration test, fix checkstyle errors

Signed-off-by: Margarit Hakobyan <[email protected]>

* Added constructor for empty rawResponse

Signed-off-by: Guian Gumpac <[email protected]>

* Added constant for supported formats

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit test

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comments

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comments:

Signed-off-by: Guian Gumpac <[email protected]>

* Fixed issue

Signed-off-by: Guian Gumpac <[email protected]>

* Added getter for rawResponse in PhysicalPlan

Signed-off-by: Guian Gumpac <[email protected]>

* Legacy fall back with JSON format (#237)

* Implement json support for V2 engine

Signed-off-by: Margarit Hakobyan <[email protected]>

* Reverted some changes

Signed-off-by: Margarit Hakobyan <[email protected]>

* Removed some fields

Signed-off-by: Margarit Hakobyan <[email protected]>

* minor fix

Signed-off-by: Margarit Hakobyan <[email protected]>

* Added a unit test, cleaned up

Signed-off-by: Margarit Hakobyan <[email protected]>

* Returning raw OpenSearch response when type is json

Signed-off-by: Margarit Hakobyan <[email protected]>

* Add an integration test, fix checkstyle errors

Signed-off-by: Margarit Hakobyan <[email protected]>

* Made new engine fallback to legacy for in memory operations for json format

Signed-off-by: Guian Gumpac <[email protected]>

* Address build failures

Signed-off-by: MaxKsyunz <[email protected]>

* Added legacy fall back

Signed-off-by: Guian Gumpac <[email protected]>

* Refactored fall back logic to use visitor design pattern

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit tests

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary IT

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR feedback

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary context

Signed-off-by: Guian Gumpac <[email protected]>

* Added fall back for Filter functions

Signed-off-by: Guian Gumpac <[email protected]>

* Made new engine fallback to legacy for in memory operations for json format

Signed-off-by: Guian Gumpac <[email protected]>

* Address build failures

Signed-off-by: MaxKsyunz <[email protected]>

* Added legacy fall back

Signed-off-by: Guian Gumpac <[email protected]>

* Refactored fall back logic to use visitor design pattern

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit tests

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary IT

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR feedback

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary context

Signed-off-by: Guian Gumpac <[email protected]>

* Added fall back for Filter functions

Signed-off-by: Guian Gumpac <[email protected]>


---------

Signed-off-by: MaxKsyunz <[email protected]>

* Fixed checkstyle errors

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comments and fixed the visitor

Signed-off-by: Guian Gumpac <[email protected]>

* Added comment to visitor class

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comments to improve visitor class

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit tests for JsonSupportVisitor

Signed-off-by: Guian Gumpac <[email protected]>

* Added helper function for SQLServiceTest

Signed-off-by: Guian Gumpac <[email protected]>

* Added expected failures

Signed-off-by: Guian Gumpac <[email protected]>

* Reworked the visitor class to have type Void instead of Boolean

Signed-off-by: Guian Gumpac <[email protected]>

* Fixed typo

Signed-off-by: Guian Gumpac <[email protected]>

* Added github link for tracking issue

Signed-off-by: Guian Gumpac <[email protected]>

---------

Signed-off-by: Margarit Hakobyan <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Co-authored-by: Margarit Hakobyan <[email protected]>
Co-authored-by: MaxKsyunz <[email protected]>

* Reverted OpenSearchIndexScan changes as it broke IT

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit test

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unused Mock variable

Signed-off-by: Guian Gumpac <[email protected]>

---------

Signed-off-by: Margarit Hakobyan <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Co-authored-by: Guian Gumpac <[email protected]>
Co-authored-by: MaxKsyunz <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2023

Codecov Report

Merging #1464 (4e692fe) into main (bc39346) will increase coverage by 0.04%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #1464      +/-   ##
============================================
+ Coverage     98.43%   98.47%   +0.04%     
- Complexity     3775     3894     +119     
============================================
  Files           343      348       +5     
  Lines          9364     9663     +299     
  Branches        599      620      +21     
============================================
+ Hits           9217     9516     +299     
  Misses          142      142              
  Partials          5        5              
Flag Coverage Δ
sql-engine 98.47% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/opensearch/sql/analysis/JsonSupportVisitor.java 100.00% <100.00%> (ø)
...search/sql/analysis/JsonSupportVisitorContext.java 100.00% <100.00%> (ø)
...a/org/opensearch/sql/executor/ExecutionEngine.java 100.00% <100.00%> (ø)
.../opensearch/sql/planner/physical/PhysicalPlan.java 100.00% <100.00%> (ø)
...opensearch/executor/OpenSearchExecutionEngine.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/response/OpenSearchResponse.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/storage/OpenSearchIndexScan.java 100.00% <100.00%> (ø)
...pensearch/sql/protocol/response/format/Format.java 100.00% <100.00%> (ø)
...c/main/java/org/opensearch/sql/sql/SQLService.java 100.00% <100.00%> (ø)
.../org/opensearch/sql/sql/antlr/SQLSyntaxParser.java 100.00% <100.00%> (ø)
... and 1 more

... and 32 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Override
public Void visit(Node node, JsonSupportVisitorContext context) {
visitChildren(node, context);
return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this code using Void, return null and raising exceptions instead of just returning boolean?

The visitor answers a yes or no question. Using boolean is simpler and more natural here.

Using exceptions for flow control is not a recommended practice. SQL plugin does that with fallback to legacy, but that's no great -- let's not add more.

The only thing it would lose is knowing which element caused validation to fail. That seems ok in this case -- the error message will be slightly worse.

If you do want to maintain the error message, you could return Optional<String>. Then for supported queries JsonSupportVisitor.visit will return Optional.empty(). Otherwise it returns the label of the first element that caused validation to fail. That's enough information to throw exception with the same message. As a bonus, throw will happen in one place now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Boolean return type is a good change.
I'm not convinced that removing the Exception throws in this instance is the right way. Try...Catch blocks are slightly more expensive in processing time, but then we have the satisfaction of knowing that the NodeVisitor is interrupted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 4e692fe


public String getRawResponse() {
return getChild().stream().map(PhysicalPlan::getRawResponse)
.filter(r -> r != null && !r.isEmpty()).findFirst().orElse("");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.filter(r -> r != null && !r.isEmpty()).findFirst().orElse("");
.filter(StringUtils::isNotEmpty).findFirst().orElse("");

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 4e692fe

#255)

* Changed visitor type to Boolean and added LIMIT to fall back to legacy

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comment

Signed-off-by: Guian Gumpac <[email protected]>

---------

Signed-off-by: Guian Gumpac <[email protected]>
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a high level question about the implementation which I asked earlier on PoC PR too:

For now this JSON format only makes senses for OpenSearch storage. However, special logic are added in many places, QueryService, PhysicalPlan, QueryResponse, ExecutionEngine etc. Many of them are very fundamental and on critical code path of our core engine. Can we think about if any more extensible way:

  1. Either make this JSON format and code changes work for all other storage;
  2. Or minimize the special logic, ex. use _raw meta field and only use a special JSON response formatter to work with it.

Thanks!

* Unsupported features in V2 are ones the produce results that differ from
* legacy results.
*/
public class JsonSupportVisitor extends AbstractNodeVisitor<Boolean, JsonSupportVisitorContext> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if any easy way to do this. As I understand, JSON format only makes sense if the physical query plan for a query only has a Project (item is regular field) and OpenSearchIndexScan. If so, can we do this validation later?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, As I understand it, all unsupported cases will still fallback to the V1 engine, so there won't be a significant difference, right?

@dai-chen dai-chen added the enhancement New feature or request label Mar 28, 2023
@Yury-Fridlyand
Copy link
Collaborator

Yury-Fridlyand commented Mar 29, 2023

This PR is a first phase of JSON support implementation. Second phase assumes support of JSON response format for all datasources and queries. We have a PoC on test-json-pagination branch, it keep no raw response from OpenSearch/Prometheus/whatever, but it generates a response which looks 1:1 like returned by OpenSearch with resultset of the SQL plugin.
See example:

$ curl -s -XPOST http://localhost:9200/_plugins/_sql?format=json -H 'Content-Type: application/json' -d '{"query": "SELECT int0 + 1, 2, now() FROM calcs limit 2"}' | jq
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": null,
        "_id": null,
        "_score": 0,
        "fields": {
          "2": 2,
          "int0 + 1": 2,
          "now()": "2023-03-17 13:07:47"
        }
      },
      {
        "_index": null,
        "_id": null,
        "_score": 0,
        "fields": {
          "2": 2,
          "int0 + 1": null,
          "now()": "2023-03-17 13:07:47"
        }
      }
    ]
  }
}

It is not completed yet (_id, _index fields are empty - depends on #1456, took is always 0), but it works fine. For another datasource, if fills meta info (e.g. _shards) with default values.
Once we will complete the fix and resolve all dependencies, we will publish Phase 2 implementation of JSON response format.

UPD
Phase 2 is tracked in #1450

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the long-term plan, do we intend to fully deprecate the legacy engine? I believe that the legacy engine excels in supporting the JSON format, correct? What we do eventually is translate SQL statement to DSL as much as possible.

@@ -49,7 +49,8 @@ public void execute(PhysicalPlan physicalPlan, ExecutionContext context,
result.add(plan.next());
}

QueryResponse response = new QueryResponse(physicalPlan.schema(), result);
String rawResponse = physicalPlan.getRawResponse();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is json format, result is not been used, only rawResponse will be used in response, right? Does it means the query plan should be different, if the query plan to fetch raw data?

* Unsupported features in V2 are ones the produce results that differ from
* legacy results.
*/
public class JsonSupportVisitor extends AbstractNodeVisitor<Boolean, JsonSupportVisitorContext> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, As I understand it, all unsupported cases will still fallback to the V1 engine, so there won't be a significant difference, right?

@penghuo
Copy link
Collaborator

penghuo commented Mar 29, 2023

This PR is a first phase of JSON support implementation. Second phase assumes support of JSON response format for all datasources and queries. We have a PoC on test-json-pagination branch, it keep no raw response from OpenSearch/Prometheus/whatever, but it generates a response which looks 1:1 like returned by OpenSearch with resultset of the SQL plugin. See example:

$ curl -s -XPOST http://localhost:9200/_plugins/_sql?format=json -H 'Content-Type: application/json' -d '{"query": "SELECT int0 + 1, 2, now() FROM calcs limit 2"}' | jq
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": null,
        "_id": null,
        "_score": 0,
        "fields": {
          "2": 2,
          "int0 + 1": 2,
          "now()": "2023-03-17 13:07:47"
        }
      },
      {
        "_index": null,
        "_id": null,
        "_score": 0,
        "fields": {
          "2": 2,
          "int0 + 1": null,
          "now()": "2023-03-17 13:07:47"
        }
      }
    ]
  }
}

It is not completed yet (_id, _index fields are empty - depends on #1456, took is always 0), but it works fine. For another datasource, if fills meta info (e.g. _shards) with default values. Once we will complete the fix and resolve all dependencies, we will publish Phase 2 implementation of JSON response format.

UPD Phase 2 is tracked in #1450

I believe that what we are targeting here is specific to the OpenSearch raw data response. Do other data sources have similar requirements?

@Yury-Fridlyand
Copy link
Collaborator

I believe that what we are targeting here is specific to the OpenSearch raw data response. Do other data sources have similar requirements?

Not yet, but you're welcome to define them in #1450. It is possible to generate OpenSearch-like JSON using result set from Prometheus. Or create OpenSearch response for OpenSearch queries and Prometheus-like for Prometheus.

@dai-chen
Copy link
Collaborator

dai-chen commented Mar 30, 2023

I believe that what we are targeting here is specific to the OpenSearch raw data response. Do other data sources have similar requirements?

Not yet, but you're welcome to define them in #1450. It is possible to generate OpenSearch-like JSON using result set from Prometheus. Or create OpenSearch response for OpenSearch queries and Prometheus-like for Prometheus.

IMO, if we touch core logic as what we're doing in this PR, we should clearly define what's the meaning of JSON format for all data sources. Otherwise, we may want to minimize such special logic that only work for OpenSearch.

@acarbonetto @Yury-Fridlyand
I expressed the similar concern in meta field PR: #1456 (comment). We've put lots of efforts to separate modules and abstraction since the beginning of V2 engine. It's true that OpenSearch is the primary storage and we're prioritizing legacy engine deprecate. However, if we keep adding special OpenSearch logic to core, I'm afraid V2 engine would be as hard to maintain as legacy in future and all our previous efforts would be in vain.

@Yury-Fridlyand Yury-Fridlyand marked this pull request as draft April 6, 2023 21:40
@acarbonetto
Copy link
Collaborator

Closing this PR because JSON format doesn't currently have a place in the V2 engine.

@acarbonetto acarbonetto closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants