-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RDF profile for ISO terms #2
Comments
Ping @lanemk . Thanks! |
Ron, Here are the latest files. Let me know any issues you have. |
Ron, I labeled the 46.jsonld file incorrectly. It should be 47.jsonld. Maybe you filled in that gap already. I'm attaching 47.jsonld. |
Thank you @lanemk ! I'm working on them now... |
@ronaldtse I don't know if you can use these. Maybe. |
Thanks @lanemk , I can certainly use them, but will need to generalize this list into a template because the term index has to be dynamically generated... |
@ronaldtse Here are updated files. I fixed things, hopefully. Simplified rdf-profile.ttl and made 47.ttl a little more generic. |
@ronaldtse Here are updated files. |
@ronaldtse Here are updated files. The geolexica-pages template has more fields, maybe too many. Welcoming questions here. |
@ronaldtse I mistakenly sent you an empty geolexica-pages file. Here's the data-filled version. |
@lanemk I am a bit confused: Now we have In :EST-Estonian
rdf:type geolexica:Language_Code ;
skos:altLabel "eesti, eesti keel"@en ;
skos:notation "EST" ;
skos:prefLabel "Estonian"@en ;
... The In RDF, do we use title case like What are things like :Japanese_Node
rdf:type :Language_Node ;
dcterms:modified "2017-10-19"^^xsd:anyURI ;
dcterms:title "ISO/TC211関連JIS用語集" ;
schema:mainEntityOfPage "https://www.geolexica.org/registers/#language-jpn"^^xsd:anyURI ;
rdfs:label "Japanese Language Node" ;
. This for example is the "Japanese term registry". |
Also, |
@lanemk a few more questions:
Why not:
?
There could be many examples and notes. Can we have:
There could be multiple "reviews" leading to multiple "review notes". How do we handle them?
Or do we only need them if the term contains those languages (and that the countries utilize the term)? |
@ronaldtse I have responses for you, in bold text. Now we have language-codes.ttl and country_codes_ap.ttl. Do we need to serve those files? Or can we point elsewhere for those codes? The language codes are ISO 639-2 codes, and country codes are ISO 3166 codes. Alas, I struggled with the translations, codes, and the registry. I tried to line everything up, so codes could be used across the Concept pages and the Registers. I believe now this is a muddied effort, and I should, or will work to simplify matters. I’m believe I’m confusing Country code (in the data definition) with Language code (which should only be a reference, for instance, to language-tag skos:prefLabel, skos:definition, or perhaps Term_Abbreviation). I tried to find standards-based, api-accessible country codes: The same with language codes: This data can be accessed through an API, or CSV files can be downloaded and transformed into RDF (TTL, JSON-LD) for use in Geolexica. Do you have a preference for how to proceed? lanemk a few more questions: We used to use rdf:type of skos:Concept but now schema:ItemPage. What is the difference? “schema:ItemPage” is a predefined class (a new class in geolexica-ap), used to represent the Concept template page itself. The ItemPage is a compendium of all relevant info about one concept (term), a container of sorts. ItemPage is a “concept”, but not really for the current SKOS framework. This leads to this…. All glossary terms (concepts) are now serialized as “skos:Concept(s)” in the accompanying geolexica-terms.ttl file. Hopefully these can be referenced in the API to serve “rdfs:label” or “skos:prefLabel” values on a given template page. geolexica-ap:authoritative_source geolexica-ap:ISO_19132_2007 is used to indicate the source. However, we clearly cannot manually enumerate all the sources. Do we need a separate TTL file that lists out all the sources? Or can these sources remain as strings for now? Authoritative source as a string is fine (as in the data definition anyway). To address the number of sources, I was trying to preserve that URL, or any URL, for accessing the standards documentation. A separate RDF file of these sources may be appropriate. Just import the geolexica-ap to align with the ontology. I can work on that What do these do? dcterms:identifier "geolexica-ap:empty_field" is a predefined property and dummy value for geolexica-ap property “termID”. I’m merely reusing “dcterms:identifier” from the Dublin Core namespace. While I favor reusing standard metadata, the local “geolexica-ap:termID” can be used just as well. To note, “dcterms:identifier” is also used for “termID” in geoloexica-terms.ttl. An owl:sameAs property can be applied to termID/identifier to match them in RDF space. geolexica-ap:conceptURI is the “property” seeking values from the class geolexica-ap:Concept_URI. Concept_URI is intended to link to the Concept page by URL. This is essentially self-referential within the template, and is inconsequential. It can be removed. geolexica-ap:date_accepted "2019-11-28"^^xsd:date ; Why not: I think your code is fine. I entered the actual dates as dummy values, placeholders. geolexica-ap:example_n geolexica-ap:empty_field ; There could be many examples and notes. Can we have: I can rename these properties as “example” and “note”. There can be as many as desired, much like skos:definition(s), without language tags. A language tag (i.e., geolexica-ap:note "A vehicle can travel on ground."@en) can be applied to examples and notes as needed. “geolexica-ap:empty_field” is just a placeholder (a dummy value). geolexica-ap:review_date "2019-11-28"^^xsd:date ; There could be multiple "reviews" leading to multiple "review notes". How do we handle them? I could create “review notes” enumerations within its respective class (i.e., geolexica-ap:review_decision_notes). Or it might be specified as a repeatable string value. What is this? Term Synonyms are specified in the data definition and geolexica-ap:term_synonym is the property to deliver values of this class. Once in a while they are showing up as strings in the dataset. Why do we put the identifier as empty? "geolexica-ap:empty_field" is a placeholder, used when I had no values to draw from. “geolexica-ap:termID” may win out over “dcterms:identifier”. They are both unique identifiers. “termID” matches the local ontology. Do we really need these? Or do we only need them if the term contains those languages (and that the countries utilize the term)? I need to simplify the language and country nodes and codes. I’m taking a chainsaw to a much more delicate problem. Let me think about it and present you with a solution. |
@ronaldtse I did a redux on language and country codes. I included only those present in the data definition. This can be extended, if needed. In the language code file, you will find reference to skos data, e.g., :CHI-Chinese So, when this is referenced in the geolexica-pages file: e.g., "geolexica-ap:language_code https://www.geolexica.org/api/language-codes#CHI-Chinese ;", the heart of the matter (properties/values) is in "language-codes", whether it's TTL or JSON-LD. I also included a direct string label in the geolexica-pages file, e.g., "geolexica-ap:langCode "CHI"@en ;", so they are there if you find them convenient. You will find the geolexica-pages file stripped of many dummy values now in favor of placeholder data. I tried to align datatypes where dynamic code, presumably, can fill in values appropriately. Let me know how this is working out. I hope it meets your needs. I'm curious how you're accessing all "skos:definition" and "skos:prefLabel" values in different languages for the geolexica-pages file. Do these values need to be in an RDF dataset? I guess I'm a little confused about your method. Any insight is welcome. The geolexica-terms file now includes a view of all glossary terms as skos Concept(s), and geolexica-ap:GlossaryTerm(s). This includes terms' "skos:prefLabel" in English, and "geolexica-ap:termID", and the superfluous "dcterms:identifier", which you can ignore. The geolexica-ap file is now expanded to match (or, map) fields from the data definition to RDF/SKOS classes/concepts. You will find classes with or without instance data, depending on enumerations, or any other datatyped value. It will depend on the class, and related properties that reference values. I regret I have a mix of camel case and "_" separation for naming my properties and classes. I wanted to make them readable, but I also want to be consistent. I suppose it's a matter of preference. I can rename everything to camel case if you'd like. But they should work just fine, there is no strict rule in RDF/SKOS. Just match "resources" as named to dynamic values. I think that is most everything. I look forward to your feedback. |
@ronaldtse Ron, here are concept terms in English. Is this a dataset more like you're looking for? I can do this for the remaining languages. |
@ronaldtse Hi Ron, I hope this finds you well. |
@ronaldtse Ron, I offer some updates...mostly cleaning up and simplifying the RDF profile, and the concept page template. Let me know if you have questions.
|
Thanks @lanemk ! Sorry for the less than rapid responses, but I will try to get these implemented before the new year 😉 The way I’m doing it isn’t quite working since it’s more of a hack, but should be able to transition to a proper approach using a Ruby library as a Jekyll plugin. |
Thanks @ronaldtse, absolutely no worries. I only hope I can give you something you can work with. I've been studying up on Ruby in the meantime. So, I'll be curious on the approach, and maybe I can chip in, who knows. ~cheers! |
@ronaldtse Hi Ron...I hope all's well. Not sure if you're familiar with Ruby-RDF. These are, in their words, "Public domain libraries for RDF & SPARQL in the Ruby programming language." In other words, readers & writers of many types. I think you may find them useful. -- cheers! |
OK @ronaldtse. |
@ronaldtse I'm trying to understand what is your request about, but without much success so far. |
@skalee can you Skype me? |
@skalee is going to take care of this. Any updates so far? |
@ronaldtse @skalee No updates. Let me know if there are questions about the SKOS/RDF. |
@lanemk I got two questions.
whereas
Note the
|
@lanemk @ronaldtse Got another couple of questions:
|
@lanemk could you help answer questions 1 and 2?
Thanks! |
@lanemk @ronaldtse ping, clarification needed regarding above questions. |
Ping @lanemk , thanks! |
@skalee regarding 2-char vs 3-char language codes (2.), the answer is this: https://listserv.loc.gov/cgi-bin/wa?A2=ind1407&L=BIBFRAME&P=853
Since RDF refers to BCP47 in its "language tag", we should use a 2-char ISO 639-1 code if it exists, otherwise a 3-char ISO 639-2 code. |
@ronaldtse, I still need clarification in one aspect. Which one of these URIs is correct: The
whereas
|
@lanemk any recommendation regarding the question above? If we can choose any, let's go with the first one without the
References: |
@ronaldtse Here's a draft of the Geolexica ontology (one ontology, 3 file formats) which addresses most if not all issues to model the concepts. I will have some clarifying questions later, but I invite you to take a look. Next I'll work out a draft of an individual concept (i.e., </concepts/2>) in SKOS-RDF, JSON-LD, and an abbreviated full data set to see what they should look like with the above ontology. From there, SPARQL queries ought to retrieve the data. I look forward to your questions/comments. |
@ronaldtse Here's a draft of an individual concept (</concepts/6/>) in RDF/XML, JSON-LD, and TTL, and an abbreviated full data set with only 5 concepts for now. I also fine-tuned the ontology and it is attached as well, to support the other files. Next, I'll cook up some SPARQL queries to slice and dice the data set. Please bear in mind I focused solely on the MLGT Glossary, since it is large and varied. I'm reasonably confident it solves all the problems. I'll know better after the SPARQL phase. --Mike geolexica-ontology-DRAFT-20222001.zip |
This is to implement proper RDF (TTL, JSON-LD) for ISO terms in Geolexica.
The text was updated successfully, but these errors were encountered: