-
Notifications
You must be signed in to change notification settings - Fork 92
[Request for discussion] What software project metadata OMB should ask agencies for #117
Comments
I'd also suggest the project metadata include a brief, one- to two-sentance description. Given that the open source community loves creative names, having the metadata of "GMan" does not aid discoverability of the project, as much as "A Ruby gem to detect government domains" does. |
We meant to include that field, actually -- not sure how it got dropped. I went and edited our above comment to include:
|
Also see discussions in #132 . Also Note 'URL' is prone to link rot and ambiguous. At the very least, I could see:
(insert disclaimer here about these being personal comments, and not that the of the agency I work for, blah blah blah). |
I don't think it's seen broad adoption outside the context it's used by European governments for this purpose, but I should note ADMS.SW as a precedent for this. ADMS.SW builds on related schemas including DOAP, SPDX, ISO 19770-2, ADMS, and the Trove software map. cc: @makxdekkers |
Per related comments #116 and #40: In Issue #40 @dsmorgan77 states:
Source code is information structured in a way that computers can interpret, I agree that source code as structured information is data. To this I suggest ProjectOpenSource start with a minimal but extendable/semi-optional metadata schema based on WC3's Data Catalog Vocabulary (DCAT). The most recent attempt to codify metadata for software was the European Commission's ADMS v2.0 which was drastically updated to remap to DCAT, so too was ProjectOpenData's data.json v1.1 based on DCAT. Basing the TBD code.json schema on DCAT would ensure, a controlled vocabulary, high levels of interoperablity, and promote semantic web / linked data efforts. Other metadata standards/specs/schemas reviewed and referenced included:
Beyond the general suggestion to use the DCAT controlled vocabulary, I suggest the specific use of three fields:
ADMS v 2.0 mapped to DCAT:ADMS v2.0 Overview |
See also Mozilla Science's CodeMeta: |
I'm Jamie Jones, a Solutions Engineer at GitHub supporting the Federal Government. I'm a big fan of this thing called GitHub, and a big fan of sharing code and making it open source. The opinions within are my own, but ideas are often from many conversations I've had around the community (both OSS, Govt, and commercial). Before coming to GitHub, I wrote software for the govt, and would have loved to not make YAMA (Yet Another Map App) but instead be able to reuse some better code. Of course, the most important question about an inventory mechanism to answer is: does it help agencies with similar needs find each others' projects and collaborate? But closely behind it, you need to ensure it can grow as your requirements evolve, and is maintainable going forward. So it's a battle between being complete, but not being too much of a burden. Not all problems are technical. When we look at metadata done by other Open Source projects, a simple example that comes to mind is Netflix's Other formats
It acknowledges that there are already a very large number of metadata standards for describing data & software in the academic community and it's an attempt to make all of them inter-operable.
Maybe not a format or file at all...Why use a metadata file where you can get much of this same information from a projects pre-existing files and the source control host itself.
In general, there's little to no widespread adoption of a software metadata file within repos outside of aforementioned package files.
Some examples of how the GitHub API can provide more details for you:
|
Related to #116 While i'm still for an established metadata standard as a single .yml file. I agree it should first be made practical and agree with @jbjonesjr KISS approach as a first cut of the proudly public repos/orgs... Though it wouldn't for self-hosted or externally managed source code repositories (bitbucket or gitlabs for example). I put some work a few months ago to compare some of these KISS based inventories already out there for US Federal OSS:
Govcode - Git Repo
|
+1 to the KISS and start with a simple registry approach discussed here from @jbjonesjr and @JJediny. There's a ton of stuff we can get in terms of inventory information through simple deliberate registration in some fashion of the official repos for Federal agencies. Connecting the dots to the budget line item Programs that will put things in context for OMB below the agency/bureau/office/etc level will be a little challenging but could be an interesting linked data problem hooking together GitHub and other repo accounts and identifiers. USGS (where I'm from) has a couple of orgs here, USGS and USGS-R, that contain our officially sanctioned repos. A few smart people working together to bring those most visible projects together into a registry/index of some kind would go a long way to baselining where we are and to get the ball rolling. It would be a heck of a lot better than the usual gov fare of data calls, forms, and (gasp!) spreadsheets. You might also check out some recent work from @yolandagil and others on something called OntoSoft under the NSF EarthCube project that has been working on software registries and documentation methods for scientific software, specifically. They've worked up some interesting tools for introspecting a repository and providing a report on its viability for reuse. |
The largest web search and discovery engines (Google, Yahoo, Microsoft, Yandex) have collaborated on creating a Linked Data web schema such that all search engines can index and semantically search all structured data on the web using a single common schema. This is at https://schema.org Recommendation: adopt the W3C Linked Data standard (JSON-LD) for the code.gov software catalog using the schema.org metadata model for Software Application: https://schema.org/SoftwareApplication Note that the U.S. Department of Veterans Affairs, U.S. National Library of Congress, and many other federal agencies and knowledge organizations have already adopted the W3C Linked Data standard and JSON-LD as their metadata standard, making them fully W3C compliant for web search engines. See, for example the VA's VISTA Data Project, which is all JSON-LD based: |
Closing - please move further discussion to GSA/code-gov-web#30 |
(I’m Eric, an engineer at 18F, an office in the U.S. General Services Administration (GSA) that provides in-house digital services consulting for the federal government. I’m commenting on behalf of 18F; we’re an open source team and happy to share our thoughts and experiences. This comment represents only the views of 18F, not necessarily those of the GSA or its Chief Information Officer.)
The Implementation section of the policy asks agencies to inventory their open- and closed-source software projects, so that OMB and the public can increase the discoverability of agency software. This seems very similar to what agencies do with their datasets, in support of M-13-13 and Project Open Data.
These fields will be maintained by CIO shops, likely mostly manually. Our premise is that the fewer the fields, and the easier it is for staff to maintain the data by hand, the more complete and timely the data will be.
We wanted to throw out a first pass at the fields we think are the highest priority, and we're trying to stay on the minimal side. We'd love to hear others' thoughts about what fields should be in here.
At the overall listing level:
And at the per-project level:
This is just a starting point, and there's surely things we haven't thought of, or have discounted. However, we definitely are of the mind that fewer fields will produce more complete results.
The text was updated successfully, but these errors were encountered: