Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPF algorithm for index data source is incorrect if a requested triple pattern contains a variable multiple times #25

Open
hartig opened this issue Dec 25, 2015 · 13 comments
Labels

Comments

@hartig
Copy link
Contributor

hartig commented Dec 25, 2015

The algorithm that generates TPF responses for the index data source is incorrect for cases in which the requested triple pattern contains a specific variable multiple times. For instance, suppose the client requests triple pattern (?x, foaf:knows, ?x) with variable ?x explicitly mentioned in the HTTP request (using both, the subject parameter and the object parameter). In response to this request, the existing algorithm would return all triples (from the dataset) that have foaf:knows in the predicate position; hence, the algorithm ignores the additional constraint that subject and object must be equivalent.

@mielvds
Copy link
Contributor

mielvds commented Dec 28, 2015

Depends on the semantics of TPF i presume. Did we include this in the formalization?

@hartig
Copy link
Contributor Author

hartig commented Dec 28, 2015

Yes we did. In the ISWC paper we (i) defined the notion of matching triples for a given triple pattern and (ii) formalized triple pattern selectors in terms of this notion. By these definitions, a triple (ex:Bob,foaf:knows,ex:Alice) is not a matching triple for the triple pattern (?x,foaf:knows,?x) and, thus, this triple must not be included in the data of the TPF of that triple pattern.

BTW, what we did not capture formally is the possibility of omitting elements of a triple pattern in a request.

@RubenVerborgh

@mielvds
Copy link
Contributor

mielvds commented Dec 28, 2015

Yeah, that's what I was wondering about. {null, p, null} =?= {?x, p, ?x}

@RubenVerborgh
Copy link
Member

Triple patterns with multiple occurrences of the same variable are part of TPF. They can have the following forms in theory:

  • ?s ?p/<p> ?s (plausible)
  • ?s ?s ?o/<o> (implausible)
  • ?s/<s> ?p ?p (implausible)
  • ?s ?s ?s (extremely implausible)

It is not decided yet how this will be encoded at the HTTP interface level. Currently, an empty or non-specified value for any of the components means “variable”. Hence, omissions are the same as (unique) variables.

However, this mechanism is evidently not sufficient to say that two variables are the same. For this, we could either use explicit variables (?x) or blank nodes (_:x).

Note that, in any case, the server is free to return more data than its selector prescribes, so even though (ex:Bob,foaf:knows,ex:Alice) is not a match of (?x,foaf:knows,?x), the server is free to include it in any fragment (as long as the triple is part of the dataset). This allows us to, for instance, also return labels.

@hartig
Copy link
Contributor Author

hartig commented Dec 28, 2015

@mielvds (null,p,null) is to be interpreted as (?x,p,?y) where ?x and ?y are two arbitrary variables such that ?x != ?y.

@RubenVerborgh Regarding the encoding of variables that are specified explicitly at the HTTP interface level, the original code of Java.Server assumed they are given as strings that start with a question mark (e.g., see line 196 in

// variable or blank node indicates an unknown
) and turned them silently into unspecified variables. The code in my recent refactoring assumes the same encoding (strings that start with a question mark), but interprets such parameters as explicitly specified variables (see line 33 in https://github.com/LinkedDataFragments/Server.Java/blob/master/src/org/linkeddatafragments/util/TriplePatternElementParser.java#L33).

@RubenVerborgh Regarding your comment that "the server is free to return more data than its selector prescribes," please note that, by the formalization in our ISWC paper, adding more data is not possible.

@RubenVerborgh
Copy link
Member

Regarding encoding, I think that the _ and ? cases should be the same. Blank variables are scoped to the entire request (IRI), so their name matters as well.

Agree on the formalization and still stand by that theoretical definition of TPF. Strictly speaking, a TPF with more data isn't a TPF. However, in practice, clients can (and should) be relaxed about this. From the formalization viewpoint, the extra data could just be metadata.

@hartig
Copy link
Contributor Author

hartig commented Dec 28, 2015

I agree that blank nodes in TPF requests should be dealt with in the same way as (specific, named) variables are dealt with. Therefore, I would extend the TriplePatternElement interface by adding the following three methods:

  • boolean isNamedVariable(): returns true if the element is a specific variable that has a name (i.e., it is denoted by a string that begins with a question mark)
  • boolean isUnnamedVariable(): returns true if the element is a specific variable that does not have a name (i.e., it is denoted by a blank node)
  • TermType asAnonymousVariable(): returns a representation of the element as a blank node (assuming it is a specific, but non-named variable).

@RubenVerborgh, @mielvds OK?

@mielvds
Copy link
Contributor

mielvds commented Dec 29, 2015

Ok. I just wonder if we can just call isUnnamedVariable() isBlank() or isAnonymous() instead, since that's what we are talking about. I agree it makes sense because of the variable context, but on the other hand, a new 'name' might be confusing.

At least it could be isAnonymousVariable() to be consistent with asAnonymousVariable()?

@RubenVerborgh
Copy link
Member

Can't the parser just autoconvert non-named variables to named variables?
After all, ?subject= is a shortcut for ?subject=?x.

@hartig
Copy link
Contributor Author

hartig commented Dec 30, 2015

@RubenVerborgh I think that automatically converting unspecified variables into named variables is not a good idea because it may cause problems if the named variable that you would assign to an omitted parameter happens to be the value of another parameter in the request. For instance, if we interpret subject= as subject=?x, we have a problem if the HTTP request contains object=?x. Similarly, such an automatic naming of unspecified variables may cause problems for something like the bindings-restricted TPF interface that Carlos and I are working on.

@RubenVerborgh
Copy link
Member

Would this also be a problem if the assignment was done intelligently? Like: always generate a variable name that is not already used?

@hartig
Copy link
Contributor Author

hartig commented Dec 30, 2015

If it was only for pure TPFs, such an intelligent assignment would work (but it might have a slight impact on the server performance because it is extra work). However, I do not have an idea of how to do it in a generic way; that is, in a way that would be guaranteed to be correct even when used by external extensions for TP-based interfaces that assume additional parameters with variables (such as our bindings-restricted TPF interface).

@hartig
Copy link
Contributor Author

hartig commented Jan 4, 2016

The TriplePatternElement interface has been extended as discussed above. However, the actual bug that this ticket is about still exists.

@mielvds mielvds added the bug label Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants