Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--inline-blank-nodes on Turtle-Turtle normalization fails on blank node that only bears a rdf:type #52

Closed
ajnelson-nist opened this issue May 18, 2023 · 5 comments

Comments

@ajnelson-nist
Copy link

I've encountered another issue with the --inline-blank-nodes flag, which I don't wholly suspect is related to #49, but I could see something about some shared code being an influence.

My Java runtime is version 18, and I've freshly produced this issue on v1.14.2.

$ openssl dgst -sha3-256 rdf-toolkit-1.14.2.jar 
SHA3-256(rdf-toolkit-1.14.2.jar)= 2d0efd578994243d43e629629b3bf44da4350268aee8d3c1bae2784ca243a924

I have this input data:

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 [
		a owl:Thing ;
	] ;
	.

On running this command ...

java -jar rdf-toolkit-1.14.2.jar --source-format turtle --target-format turtle --source test-input.ttl --target test-output.ttl

... I get output I roughly expect.

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 _:blank1 ;
	.

_:blank1
	a owl:Thing ;
	.

However, on running this command, the prior with --inline-blank-nodes added ...

java -jar rdf-toolkit-1.14.2.jar --inline-blank-nodes --source-format turtle --target-format turtle --source test-input.ttl --target test-output.ttl

... I get a stack trace:

17:17:07.807 ERROR o.e.rdf_toolkit.RdfFormatter - RdfFormatter: stopped by unexpected exception: 
17:17:07.809 ERROR o.e.rdf_toolkit.RdfFormatter - RDFHandlerException: unable to generate/write RDF output
17:17:07.809 ERROR o.e.rdf_toolkit.RdfFormatter - org.eclipse.rdf4j.rio.RDFHandlerException: unable to generate/write RDF output
	at org.edmcouncil.rdf_toolkit.writer.SortedTurtleWriter.endRDF(SortedTurtleWriter.java:179)
	at org.eclipse.rdf4j.rio.Rio.write(Rio.java:582)
	at org.edmcouncil.rdf_toolkit.runner.RdfToolkitRunner.runOnFile(RdfToolkitRunner.java:218)
	at org.edmcouncil.rdf_toolkit.runner.RdfToolkitRunner.run(RdfToolkitRunner.java:104)
	at org.edmcouncil.rdf_toolkit.RdfFormatter.run(RdfFormatter.java:64)
	at org.edmcouncil.rdf_toolkit.RdfFormatter.main(RdfFormatter.java:47)
Caused by: org.eclipse.rdf4j.rio.RDFHandlerException: unable to generate/write RDF output
	at org.edmcouncil.rdf_toolkit.writer.SortedRdfWriter.endRDF(SortedRdfWriter.java:534)
	at org.edmcouncil.rdf_toolkit.writer.SortedTurtleWriter.endRDF(SortedTurtleWriter.java:177)
	... 5 more
Caused by: java.lang.NullPointerException: Cannot invoke "org.edmcouncil.rdf_toolkit.model.SortedTurtleObjectList.iterator()" because "firstValues" is null
	at org.edmcouncil.rdf_toolkit.comparator.ComparisonUtils.isCollection(ComparisonUtils.java:138)
	at org.edmcouncil.rdf_toolkit.writer.SortedTurtleWriter.writeObject(SortedTurtleWriter.java:410)
	at org.edmcouncil.rdf_toolkit.writer.SortedTurtleWriter.writeObject(SortedTurtleWriter.java:397)
	at org.edmcouncil.rdf_toolkit.writer.SortedTurtleWriter.writePredicateAndObjectValues(SortedTurtleWriter.java:335)
	at org.edmcouncil.rdf_toolkit.writer.SortedTurtleWriter.writeSubjectTriples(SortedTurtleWriter.java:294)
	at org.edmcouncil.rdf_toolkit.writer.SortedRdfWriter.endRDF(SortedRdfWriter.java:508)
	... 6 more

Strangely, none of these similar test inputs trigger an error:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[]
	a owl:Thing ;
	.
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[ a owl:Thing ; ] .
@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 [
		rdfs:label ""@en ;
	] ;
	.
@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 [] ;
	.

This next sample did fail with the same stack trace:

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 _:blank1 ;
	.

kb:thing-2
	a owl:Thing ;
	ex:property-1 _:blank1 ;
	.

_:blank1
	a owl:Thing ;
	.

In summary, the stack trace seems to appear when:

  • --inline-blank-nodes is passed,
  • there is a blank node that is in the object position of one or more triples,
  • the only triple with that blank node as subject has predicate rdf:type.

Trying --target-format rdf-xml had no effect; I still got the stack trace in the same conditions with --target-format turtle.

Impact: This bug has lead to some code re-designs, because of running rdf-toolkit to normalize some inferencing results that could only infer anonymous nodes having a type.

ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this issue May 18, 2023
…nodes

This patch uses the inherence UUID functions from `case-utils` PR 112 to
replace the blank nodes generared with SPARQL Construct queries.  As
side effects of this migration, some bugs were fixed with generating
some associations, and inherence modeling assumptions are now specified
in code comments.

This patch also adds `prov:Start` and `prov:End` nodes to reify
`prov:Activity` (and `case-investigation:InvestigativeAction`) time
boundaries.  This will be a significant assistance in OWL-Time-based
visualization under development for `case-prov` PR 54.  Creating the
`prov:Start` and `prov:End` nodes as IRI-identified is also necessary
because of a bug observed in `rdf-toolkit`; see their Issue 52.

Since `case_prov_rdf` will now be able to generate non-blank nodes, it
has picked up two behaviors used in other projects importing
`case-utils`:

* The `--use-deterministic-uuids` flag has been added.
* The `CASE_DEMO_NONRANDOM_UUID_BASE` environment variable can now be
  used to make non-inherent deterministic UUIDs.

A follow-on patch will regenerate Make-managed files.

References:
* #54
* casework/CASE-Utilities-Python#112
* edmcouncil/rdf-toolkit#52

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this issue May 18, 2023
…nodes

This patch uses the inherence UUID functions from `case-utils` PR 112 to
replace the blank nodes generated with SPARQL Construct queries.  As
side effects of this migration, some bugs were fixed with generating
some associations, and inherence modeling assumptions are now specified
in code comments.

This patch also adds `prov:Start` and `prov:End` nodes to reify
`prov:Activity` (and `case-investigation:InvestigativeAction`) time
boundaries.  This will be a significant assistance in OWL-Time-based
visualization under development for `case-prov` PR 54.  Creating the
`prov:Start` and `prov:End` nodes as IRI-identified is also necessary
because of a bug observed in `rdf-toolkit`; see their Issue 52.

Since `case_prov_rdf` will now be able to generate non-blank nodes, it
has picked up two behaviors used in other projects importing
`case-utils`:

* The `--use-deterministic-uuids` flag has been added.
* The `CASE_DEMO_NONRANDOM_UUID_BASE` environment variable can now be
  used to make non-inherent deterministic UUIDs.

A follow-on patch will regenerate Make-managed files.

References:
* #54
* casework/CASE-Utilities-Python#112
* edmcouncil/rdf-toolkit#52

Signed-off-by: Alex Nelson <[email protected]>
@mereolog
Copy link
Contributor

@ajnelson-nist could you try to run it using 4293ce8, i.e., the latest commit from master?

@ajnelson-nist
Copy link
Author

@mereolog I'm happy to report all tests I listed in this issue passed with the --inline-blank-nodes flag.

This test dumped a strange message without the --inline-blank-nodes flag, though:

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 [] ;
	.

Shell transcript:

$ make check
java -jar ../target/rdf-toolkit-1.14.2.jar \
	  --source passed-4.ttl \
	  --source-format turtle \
	  --target _normalized-passed-4.ttl \
	  --target-format turtle
**** blank node not a subject: node1h0pvf100x1
mv _normalized-passed-4.ttl normalized-passed-4.ttl

If it helps, here's the Makefile:

#!/usr/bin/make -f

# Portions of this file contributed by NIST are governed by the following
# statement:
#
# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.

SHELL := /bin/bash

RDF_TOOLKIT_JAR := ../target/rdf-toolkit-1.14.2.jar

all: check

check: \
  normalized-failed-1.ttl \
  normalized-failed-2.ttl \
  normalized-passed-1.ttl \
  normalized-passed-2.ttl \
  normalized-passed-3.ttl \
  normalized-passed-4.ttl

normalized-%.ttl: \
  %.ttl
	java -jar $(RDF_TOOLKIT_JAR) \
	  --source $< \
	  --source-format turtle \
	  --target _$@ \
	  --target-format turtle
	mv _$@ $@

@mereolog
Copy link
Contributor

@mereolog I'm happy to report all tests I listed in this issue passed with the --inline-blank-nodes flag.

This test dumped a strange message without the --inline-blank-nodes flag, though:

@prefix ex: <http://example.org/ontology/> .
@prefix kb: <http://example.org/kb/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

kb:thing-1
	a owl:Thing ;
	ex:property-1 [] ;
	.

Shell transcript:

$ make check
java -jar ../target/rdf-toolkit-1.14.2.jar \
	  --source passed-4.ttl \
	  --source-format turtle \
	  --target _normalized-passed-4.ttl \
	  --target-format turtle
**** blank node not a subject: node1h0pvf100x1
mv _normalized-passed-4.ttl normalized-passed-4.ttl

If it helps, here's the Makefile:

#!/usr/bin/make -f

# Portions of this file contributed by NIST are governed by the following
# statement:
#
# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.

SHELL := /bin/bash

RDF_TOOLKIT_JAR := ../target/rdf-toolkit-1.14.2.jar

all: check

check: \
  normalized-failed-1.ttl \
  normalized-failed-2.ttl \
  normalized-passed-1.ttl \
  normalized-passed-2.ttl \
  normalized-passed-3.ttl \
  normalized-passed-4.ttl

normalized-%.ttl: \
  %.ttl
	java -jar $(RDF_TOOLKIT_JAR) \
	  --source $< \
	  --source-format turtle \
	  --target _$@ \
	  --target-format turtle
	mv _$@ $@

As you may expect, this happens when a bnode is not a subject in any triple. The serialiser does not like such cases because, as I tried to explain in #49 (comment), it cannot sort such nodes (and consequently the triples in which they occur).

The comment in the code says that last resort - this should never happen - rather dogmatic, but it is just a warning.

@mereolog mereolog closed this as not planned Won't fix, can't repro, duplicate, stale Jul 27, 2023
@dbpierson
Copy link

dbpierson commented Jul 27, 2023 via email

@ElisaKendall
Copy link
Contributor

@dbpierson Too funny - more likely a typo than intentional based on what I know of the author. Some of our other participants are non-native speakers of English, and I would not be surprised to see more of this sort of thing in their responses, though.

Nice to see you are still "lurking". Hope you are enjoying your retirement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants