-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH#2256 Introduce a query optimizer concept #2257
base: main
Are you sure you want to change the base?
GH#2256 Introduce a query optimizer concept #2257
Conversation
First implementation is to make it more likely that VALUES is used on the left side of a join. Added a test, but the optimizer is not invoked by default at this time
rdflib/plugins/sparql/processor.py
Outdated
@@ -63,8 +64,9 @@ def update(self, strOrQuery, initBindings={}, initNs={}): | |||
|
|||
|
|||
class SPARQLProcessor(Processor): | |||
def __init__(self, graph): | |||
def __init__(self, graph, optimizers: List[SPARQLOptimizer] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to just accept any callable that maps a query to a query:
So somewhere before this:
_QueryTranslatorType = Callable[[Query],Query]
Then:
def __init__(self, graph, optimizers: List[SPARQLOptimizer] = None): | |
def __init__(self, graph, query_translators: Optional[List[_QueryTranslatorType]] = None): |
That way, users can pass methods or free functions, and even have multiple different translator methods on one class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your tests should still work fine with that.
rdflib/plugins/sparql/optimizer.py
Outdated
return query | ||
|
||
|
||
class ValuesToTheLeftOfTheJoin(SPARQLOptimizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is valuable, I think it may be better to keep it in rdflib._contrib
, as we don't necessarily want to offer the same level of compatibility guarantees as we do for other code.
rdflib/plugins/sparql/optimizer.py
Outdated
|
||
class ValuesToTheLeftOfTheJoin(SPARQLOptimizer): | ||
|
||
def optimize(self, query: Query) -> Query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def optimize(self, query: Query) -> Query: | |
@classmethod | |
def optimize(cls, query: Query) -> Query: |
As these methods don't use the class state and are side effect free it is best to make them class methods, that way it is clearer to users that they don't have to be concerned with concurrency issues.
rdflib/plugins/sparql/optimizer.py
Outdated
query.algebra = self._optimize_node(main) | ||
return query | ||
|
||
def _optimize_node(self, cv: Any) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _optimize_node(self, cv: Any) -> Any: | |
@classmethod | |
def _optimize_node(cls, cv: Any) -> Any: |
Big doubt about the _.contrib location.
@JervenBolleman I made a PR against your branch with some changes: JervenBolleman#5 One problem with taking translators as an argument to There may be reasons to not do this also, and keep it in the constructor, but I'm not sure that there is. CC: @niklasl @westurner @RDFLib/core-reviewers |
@aucampia thank you for helping with this. The reason why I did it via the constructor, is that stores normally know how to optimize queries the best. And if you just want to modify a query before sending it for evaluation, there is no need for the store to have any knowledge of the modification, or do it for the user. My natural habitat being RDF4j I think that the approach of query optimizer pipelines is natural, and they are store constructor based. |
If this is a config file parameter, constructor args may be easier than
passing those through for each .query()
I don't know whether it makes a difference for testing whether constructor
args create a default SPARQLProcessor/Optimizer on a public attr or
the/*every* store just has extra private methods and args/kwarg?
…On Sun, Apr 9, 2023, 4:05 PM JervenBolleman ***@***.***> wrote:
@aucampia <https://github.com/aucampia> thank you for helping with this.
The reason why I did it via the constructor, is that stores normally know
how to optimize queries the best. And if you just want to modify a query
before sending it for evaluation, there is no need for the store to have
any knowledge of the modification, or do it for the user. My natural
habitat being RDF4j I think that the approach of query optimizer pipelines
is natural, and they are store constructor based.
—
Reply to this email directly, view it on GitHub
<#2257 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMNS5LS57KU3BRZHR2Z4LXAMJADANCNFSM6AAAAAAVW7RLYA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I will have to double-check how exactly this will be called then by passing it to the store, using just graph with the default store, however, the current approach requires this as far as I can tell: def test_graph_query(rdfs_graph: Graph):
requested_query_string = """
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x {
?x rdfs:label "subClassOf".
}
"""
translated_query_string = """
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x {
?x rdfs:label "subPropertyOf".
}
"""
requested_query = _prepare_query(requested_query_string)
translated_query = _prepare_query(translated_query_string)
translate = Mock(return_value=translated_query)
processor = SPARQLProcessor(rdfs_graph, translators=[translate])
result = rdfs_graph.query(requested_query, processor)
rows = []
for row in result:
assert isinstance(row, ResultRow)
rows.append(row.asdict())
assert rows == [{"x": RDFS.subPropertyOf}]
translate.assert_called_once_with(requested_query) It may be worth adding an example of how users should be using this, and also adding a tests for the idiomatic end-to-end usage. @JervenBolleman when you have a moment do check JervenBolleman#5 - happy to update it if there is something you want different. |
To not have to pass the processor as a positional arg on every query,: result = rdfs_graph.query(requested_query, processor) (If it isn't already) couldn't it be: Store.__init__(*, processor:SPARQLProcessor=None)
store = Store(, processor=processor)
# ...
result = rdfs_graph.query(requested_query) |
Will check this weekend if this will work or can be made to work. |
PRs to V6 is closed until further notice. See this for more details: |
We will be open for PRs again once this is resolved: |
First implementation is to make it more likely that VALUES is used on the left side of a join.
Added a test, but the optimizer is not invoked by default at this time
Summary of changes
Introduced a way to add new generic query optimizers that work on the query algebra.
Checklist
the same change.
./examples
for new features.CHANGELOG.md
).so maintainers can fix minor issues and keep your PR up to date.