Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a build tool #959

Open
8 tasks
fkleedorfer opened this issue Aug 20, 2024 · 21 comments
Open
8 tasks

Use a build tool #959

fkleedorfer opened this issue Aug 20, 2024 · 21 comments

Comments

@fkleedorfer
Copy link
Collaborator

fkleedorfer commented Aug 20, 2024

Use a build tool?

Problem: All Issues brought up so far require or aim at some kind of build automation. There currently is none.

Why is that a problem: Anything that needs to be done manually will cause errors, bottlenecks and dependency on individuals

Cause: Most programming languages/frameworks come with a variety of build tools, and most projects use one. However, this is an ontology project, inherently independent from programming languages, and therefore, it is not obvious what should be used. That is probably the reason why none is in use.

Fix: Choose one build tool that the community can live with and refactor the project so it uses that tool. Bonus: github actions become easier to make and maintain because they might only need to run some build targets

So, question: What would be your criteria for choosing a build tool, and which one, if any, should it be?

Originally posted by @fkleedorfer in #942 (comment)

Edit: collecting requirements/ideas/aspects from the comments here (and my own)

This issue is not about adding new functionality, just about automating what is currently done manually or semi automatically

  • format source files consistently
  • run shacl checks
  • generate quantitykind/unit associations
  • make release zip
  • build in github action
  • trigger action for pr and new commits in main (pr merge/rebase)
  • github release action

Incomplete list of future functionality to be implemented in the build

@fkleedorfer
Copy link
Collaborator Author

I think this is the first thing we need if we are to get some automation going. I'll make a draft PR soonish.

@VladimirAlexiev
Copy link

VladimirAlexiev commented Aug 28, 2024

hi @fkleedorfer ! Good idea, but could you elaborate a bit on what do you want to automate?
Let's gather a list of requirements here (cc @steveraysteveray @ralphtq).
Florian, can you undertake to collect requirements and put them in the issue description, or if you prefer in a separate file (guess that's what the PR you mentioned will be about?)

@fkleedorfer
Copy link
Collaborator Author

Would like the work to be done in reasonable small chunks (because I dont have enormous amounts of time for it), so I'd like to first not add new functionality, just automate existing.

We are looking at a lot of things that can be added once the build automation is in place.

The first problem is choosing the build system itself. I did not get a lot of input on the question in the discussion, however, the current favorite is maven. That's what my PR will be about. At the moment I am looking at how to do TTL formatting in that setting. (Probably jena prettyprint but we'll see, there is also https://github.com/atextor/turtle-formatter ). Weirdly, no maven integration for either. (Sideglance spotless)

@VladimirAlexiev
Copy link

@fkleedorfer But is there a problem with the turtle formatting of QUDT? I think it comes from TQ, and I think it's just fine?

@fkleedorfer
Copy link
Collaborator Author

fkleedorfer commented Aug 28, 2024

@fkleedorfer But is there a problem with the turtle formatting of QUDT?

(Accidentally deleted my post so I rewrite it here)
Yes: contributors cannot reproduce it. When you contribute triples, you'll add them wherever, and at some point steve pulls the code, reformats it and pushes it. Thats not a great workflow.

If formatting was part of the build, our life would be easier.

That is not to say that TQ formatting is bad. If we can use it in a build then mayb we should.

@steveraysteveray
Copy link
Collaborator

I think the serialization we use in TopBraid is fairly common - alphabetical by grouped subject - isn't it? I assume that same serialization is available via the TQ API if we use that for inferencing and validation in the build, although I haven't checked. I'm not sure what the PySHACL library does, but my understanding is that it is slower and not complete.

@dr-shorthair
Copy link
Contributor

OWL-API is also common.

@ashleysommer @nicholascar can you comment on completeness of pySHACL?

@dr-shorthair
Copy link
Contributor

dr-shorthair commented Aug 28, 2024

Else go for RDF Canonicalization https://www.w3.org/TR/rdf-canon/
JS Implementation here: https://github.com/digitalbazaar/rdf-canonize
RDFlib here?: https://github.com/eyusupov/rdflib-canon

(is this in the TQ Suite?)

@fkleedorfer
Copy link
Collaborator Author

Canonicalization is relevant for consistent ordering of blank nodes across multiple serializations. That's the one thing most formatters will fail to do.

@VladimirAlexiev
Copy link

Don't most contributors submit relatively small PRs, typically new units, where they can follow the existing formatting even by hand?

In addition to the question of formatting, let's collect other needs for a build workflow. Like checking data consistency using SPARQL. see my two bullets above.

@fkleedorfer
Copy link
Collaborator Author

Like checking data consistency using SPARQL

Would you be ok wrapping the SPARQL queries in a SHACL shape or would you prefer another way, such as a folder with files containing sparql queries, and some convention for how their results should be interpreted?

@steveraysteveray
Copy link
Collaborator

I vote for a SHACL shape, since we already do other validations that way (not yet part of the build).

@VladimirAlexiev
Copy link

VladimirAlexiev commented Sep 9, 2024

@steveraysteveray and @fkleedorfer

SHACL vs SPARQL:

the serialization we use in TopBraid is fairly common - alphabetical by grouped subject

I like it. If classes and props follow naming conventions, then that sorts them in the proper order.
I'd just move individuals last: but most ontologies have terms or individuals, not both, so that's ok.

But I see Florian contributing to https://github.com/atextor/turtle-formatter:
Can you share impressions and should we use it instead of TQ TB?

@fkleedorfer
Copy link
Collaborator Author

fkleedorfer commented Sep 9, 2024

But I see Florian contributing to https://github.com/atextor/turtle-formatter: Can you share impressions and should we use it instead of TQ TB?

My point would be that formatting should be accessible to any developer who wants to contribute. I don't think that will be the case with TopBraid. I was hoping to be able to do it with jena, but it's not so simple. turtle-formatter is a decent solution for us (if it works, which is what I'm working on).

As there is more to formatting your codebase than just formatting one file, I've prepared a contribution to spotless - a spotless RDF plugin, if you like, that will use whatever we manage on the file-formatting side (turtle-formatter for TTL, jena for everything else, or just not support anything else), to format the whole codebase. The spotless RDF plugin is more or less done, except for tests, and we'll need a published turtle-formatter jar with our changes.

EDIT: My impression of turtle-formatter is that its default output is ok, it is highly configurable, and the codebase is small and I'm confident we can contribute any formatting options that we need, for example, individuals last.

@VladimirAlexiev
Copy link

  • Also, the developer of turtle-formatter @atextor is actively engaged and responsive: a big plus
  • I think I'll use turtle-formatter for some large-scale electrical ontologies (CIM/CGMES)

@dr-shorthair
Copy link
Contributor

@nicholascar is this the formatter you use?
(I think you'll had a standard turtle formatter to help with diffs)

@VladimirAlexiev
Copy link

  • add more sorting options in longturtle serializer RDFLib/rdflib#2880 is a request to add pretty-printing features to Python's rdflib
  • A relevant thread "Diff'ing RDF files" appeared on the [email protected] and [email protected] mailing lists in Sep 2024.
  • Elisa Kendall (one of the main FIBO ontologists):
    There is an open-source tool available from the EDM Council for converting between RDF/XML, Turtle, and JSON-LD and for consistent serialization of any of these representations of RDF and OWL. The GitHub site for it is https://github.com/edmcouncil/rdf-toolkit. It is actively maintained, freely available, and addresses a number of issues mentioned on the thread, among other things. It also allows users to turn any of its features on/off as desired. It runs on the command line, or can be invoked automatically through GitHub commit hooks, for example.
    For collaborative work across development teams for large ontology projects, consistent serialization for comparison purposes was one of our first and relatively important issues. It enables visual comparison in GitHub (and likely other source code management systems), so that anyone reviewing the changes can see exactly what changed, down to the single character level.
    We also have a pipeline that looks for a myriad of issues in ontologies, performs regression testing using examples and reference data, and includes an html-based publication process that itself has a comparison feature, enabling comparison of any pull request or prior release with another version or with the latest version. The code for this is also open source, available from the EDM Council GitHub repository, though support is required for hosting and customization.

@fkleedorfer
Copy link
Collaborator Author

RDF Toolkit seems like a good tool, but it does not have the stable inline blank nodes feature I just put into turtle-formatter: edmcouncil/rdf-toolkit#49. The good thing is that now I know how to do it ;-) - but I don't know if I want to put in the time again.

However, I like their approach on formatting (git hook with the binary, all you need to do is install java and set JAVA_HOME). and I do like the end result of their pipeline: https://spec.edmcouncil.org/fibo/ontology/ @ralphtq @steveraysteveray @jhodgesatmb might want to see this as another possible direction to take the whole build/publication process

Not convincet at this point that all of this warrants a switch, but it's certainly worth thinking about it.

@nicholascar
Copy link

nicholascar commented Sep 24, 2024 via email

@fkleedorfer
Copy link
Collaborator Author

Well, FWIW, the shacl-maven-plugin was just released, which supports validation and inferencing.

@nicholascar thanks for the pointer to those standardization efforts. Very much looking forward to theresults. Hopefully, not too far off SHACL-AF Rules.

@fkleedorfer
Copy link
Collaborator Author

PR #975 addresses this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants