Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add datetime as version scheme #263

Open
tschmidtb51 opened this issue Nov 9, 2023 · 13 comments
Open

add datetime as version scheme #263

tschmidtb51 opened this issue Nov 9, 2023 · 13 comments

Comments

@tschmidtb51
Copy link
Contributor

I suggest to add a version scheme datetime that is defined as follows:
To compare 2 versions a and b, do:

Compare the as datetime parsed values (including the timezones).

Use case: Especially in cloud environments, often the version is given by a certain timestamp.

Flagging @pombredanne for attention

@prabhu
Copy link

prabhu commented Mar 21, 2024

Software should never use datetime as versions since different machines could have different time and timezone set (skews). What is wrong with semver?

@tschmidtb51
Copy link
Contributor Author

What is wrong with semver?

In general nothing. (If you are really looking for a reason: IMHO, it is hard to automatically determine, i.e. compute that, whether a change should lead to an incremented major, minor or patch version.) However, not every component uses SemVer.

There are use cases where this is need - think about cloud environments, where you might not know what exact version was running there. In such cases, a timestamp will help to communicate, e.g. the service was vulnerable from 2024-10-01T10:00:00Z to 2024-11-14T00:30:00-02:15. As timezones are mandatory - given that this versioning-scheme will strictly follow the ABNF from section 5.6 of RFC 3339 - I don't see an issue in the timezone corner.

@immqu: Please ensure and document that only timestamps according to the strict interpretation as mentioned above are accepted.

@matt-phylum
Copy link
Contributor

For regular software, usually there is a version number, even if the version number is derived from the date or an incrementing number. To specify that a cloud service was vulnerable from 10:00 one day to 00:30 another day doesn't seem useful. You can't change what time it is so nobody can affect whether the version range is currently satisfied or not.

It may seem tempting to just assign a date time version number to a version of the software, but the exact time assigned to subsecond or even second precision is not generally meaningful since building and packaging the software will usually take at least several seconds. If the software is being distributed for installation in other clouds, it would probably be better to use the (semver compatible) 2024.11.14 or the (extended semver like NuGet uses) 2024.11.14.0 instead of 2024-11-14T00:30:00-02:15. This skips the semantic part of semantic versioning, removing both its benefits and the problem of having to think about it.

If this is really useful, it might be a good idea to allow only UTC timestamps. RFC3339 timestamps can be lexicographically sorted, but they sort by the local time, and if you're talking about the version of software deployed to an unversioned cloud service the timezone offset doesn't matter. If timezone offset support is required then implementations need to have support for parsing RFC3339 and doing basic calendar operations (including things like leap years).

However, if the timezone were constrained to UTC, having a date time scheme would only impart human meaning to a version that could just as well be an opaque, lexicographically sorted string to whatever software is comparing the versions.

@fvsamson
Copy link

fvsamson commented Nov 14, 2024

IMO this discussion so far misses some crucial points:

  • Quite some software developers do not utilise SemVer or CalVer consistently or strictly (i.e. as specified).
    While this is unfortunate, that is simply how software is versioned, i.e. this is nothing this specification can change. IMO it shall be able to properly describe any software, hence limiting it to versioning schemes adhering to SemVer or CalVer is simply not feasible.

  • Some software developers do even not use a versioning scheme with monotonously increasing fields, e.g. by including Git-hashes in the release field: Nobody and nothing (e.g. libversion) can decide if version 1.2.3-<hash_a> is newer or older than version 1.2.3-<hash_b>. Such versioning schemes even may be compliant to SemVer or CalVer, as this example depicts. But I have also seen developers carelessly including strings in versioning fields, most often but not always in the release field, e.g. 1.2.3-4 and 1.2.3-beta5: Regular sorting would result in the latter being determined as newer version, but usually it is meant the other way around (i.e. 1.2.3-4 being a release version, not an alpha-, beta- or pre-release version), ignoring all non-digit characters would also result in incorrect ordering.

  • In the context of CI/CD pipelines, new releases can be generated and deployed without altering the <version>-<release> string at all. Imagine a docker image being automatically assembled, pulling in software components from various sources: When the image configuration / definition (i.e. which components are pulled in) and CI/CD configuration stays the same, the image's version will be the same, even thought some software components further down in the software stack may have been updated. Mind that this can happen in a layered manner, which is opaque to the docker image's maintainer and the CI/CD workflow assembling it (e.g. software components with statically linked libraries etc.).

    To provide a more generalised perspective on this point: If one does not fully control the the transformation from the original source code of every piece of software which is integrated into an image to the final image, it is impossible to properly determine if the version of a component has been increased (component update), decreased (component downgrade) or stayed the same.
    In short: As soon as you deal with binaries (e.g. for assembling an image, or an image proper), one can only determine by hashing if a component was altered, but not if it has been updated or downgraded; OTOH the hash value of a component usually changes by simple recompilation (no, reproducible builds are still far from being universally used, or even mandatory) even though the component's version is still the same (if properly done, this should be reflected in the release field, which may comprise a git-hash as denoted earlier, thus being useless).

In all these cases, version comparisons to determine if a version is newer, older or kept, and consequently defining version ranges solely based on <version>-<release> strings becomes futile.
Hence allowing for timestamps to denote a specific version is inevitable, AFAICS.


Discussing the specific format of such timestamps is a bit academic from my perspective. I assume using RFC3399 for timestamps is set, but restricting them to UTC or allowing for timezones is primarily a matter of taste; both are unambiguous and convertible into each other.

I suggest allowing for any timestamp adhering to RFC3339, section 5.6, because RFC3399 strongly suggests to use that "in new protocols on the Internet", and it is what many other file format specifications (e.g. iCal) utilise. While restricting RFC3399 timestamps to UTC allows for a simple string comparison to determine ordering, the conversion of timestamps fully supporting RFC3399 from and to UTC timezone in order to compare them is almost trivial and well documented, requiring very few lines of code even when implementing this as a shell script, and many libraries exist to perform these conversions.

P.S.: What does make a lot of sense to me, is to specify that the optional time-secfrac field must not be used. One may consider requiring the time-second field to be set to 00, but that would assume that two releases will never be created within a minute; IMO that is not 100% guaranteed, while creating two releases less than a second apart sounds extremely unlikely to me.

@matt-phylum
Copy link
Contributor

  • IMO Purl shall be able to properly describe any software

This is about vers, not PURL. PURL has no restrictions on the format of the version and derives no meaning from the version.

  • Regular sorting would result in the latter being determined as the newer version, but usually it is meant the other way around (i.e. 1.2.3-4 being a release version, not a beta).

This is true for semver, not "regular sorting." In at least Maven and lexicographical order, 1.2.3-4 comes after 1.2.3.

The correct, or most correct, way to do it for semver would be 1.2.3+4 which is neither greater nor less than 1.2.3: https://semver.org/#spec-item-10

  • In the context of CI/CD pipelines, new releases can be generated and deployed without altering the <version>-<release> string at all.

I don't think you can use vers to specify that in a list of versions 1.0 1.1 1.2 2.0 you want version 2024-11-14. That would require vers implementations to know more about the available options than their versions, and possibly information that is unknowable (eg the creation date of files that have been deleted or replaced).

CI/CD pipeline output usually cannot be sorted by only date without a separate version. Because of development branches or release maintenance branches, it's common if you put all the builds in order only by date you'll see changes appear and disappear multiple times, sometimes for months. You can avoid this ordering problem by having an assigned version along with the date version, but I don't think that's something for vers, and it does not solve the problem of not knowing the dates.

You should be able to write in an sbom or something that you used the version 1.2 from 2024-11-14 or the version 1.2 with a particular hash if there could be meaningful differences between different 1.2 versions, but that doesn't make the date a meaningful version number.

@tschmidtb51
Copy link
Contributor Author

I don't think you can use vers to specify that in a list of versions 1.0 1.1 1.2 2.0 you want version 2024-11-14. That would require vers implementations to know more about the available options than their versions, and possibly information that is unknowable (eg the creation date of files that have been deleted or replaced).

Agreed. IMHO, but this is missing the point: The introduction of the version scheme datetime does not try to cross version scheme boundaries. Its introduction should help to describe the world that is already there. And sometimes a time frame is all that makes sense to describe it.

Let me try to explain that with a different example:
vers is independent of the type of product it is used for. So one could also use it to describe hardware. If a chip manufacturer realizes that a certain production fault affects a certain production of his chip A, he would be able to the datetime version scheme to communicate that.

To specify that a cloud service was vulnerable from 10:00 one day to 00:30 another day doesn't seem useful. You can't change what time it is so nobody can affect whether the version range is currently satisfied or not.

IMHO, this is extremely useful - especially for incident response. First, you can use that to narrow down your event logs to the time in question to review them. Moreover, you can use such range to determine whether an exploitation attempt might have been successful as in falls into the vulnerable time. A the best thing is, that this can be automated, if vers is uses as this sets the rules for the comparison.
Also there could be a compliance case here.

@fvsamson
Copy link

fvsamson commented Nov 14, 2024

This is about vers, not PURL. PURL has no restrictions on the format of the version and derives no meaning from the version.

O.K., imprecise wording by me, due to this discussion happening under package-url/purl-spec. I omitted "PURL", which does not alter any content of my statements.

  • [But I have also seen developers carelessly including strings in versioning fields, most often but not always in the release field, e.g. 1.2.3-4 and 1.2.3-beta5:] Regular sorting would result in the latter being determined as the newer version, but usually it is meant the other way around (i.e. 1.2.3-4 being a release version, not a beta).

This is true for semver, not "regular sorting." In at least Maven and lexicographical order, 1.2.3-4 comes after 1.2.3.

"Regular sorting" is lexicographical for me, because that is what the POSIX command sort does by default. Please, let us argue about content not wording.

Still, my statement 1.2.3-beta5 will be regarded newer as 1.2.3-4 holds true, IMO.

The correct, or most correct, way to do it for semver would be 1.2.3+4 which is neither greater nor less than 1.2.3: https://semver.org/#spec-item-10

I like SemVer, I use SemVer a lot, still SemVer is irrelevant for this discussion as long it is not mandatory for all software developers.
I.e. SemVer can be nothing more than an example for a versioning scheme in this context, because an arbitrary number of other versioning schemes exist, with some of the "home-brew" ones being unsuitable for their task but nevertheless must be accounted for.
As I privately do some RPM packaging, I am sometimes confronted with developers even altering their versioning scheme, e.g. going from a three field scheme akin SemVer to a four field scheme and back after some releases: This sure drives tooling nuts (and me, too).

  • In the context of CI/CD pipelines, new releases can be generated and deployed without altering the <version>-<release> string at all.

I don't think you can use vers to specify that in a list of versions 1.0 1.1 1.2 2.0 you want version 2024-11-14. That would require vers implementations to know more about the available options than their versions, and possibly information that is unknowable (eg the creation date of files that have been deleted or replaced).

Exactly: In many cases, the version string is insufficient to meet an informed decision.
This is the principal point of my original message.

You should be able to write in an sbom or something that you used the version 1.2 from 2024-11-14 or the version 1.2 with a particular hash if there could be meaningful differences between different 1.2 versions, but that doesn't make the date a meaningful version number.

Correct, except for the fact that a hash value is not helpful, because it does not allow to determine any ordering.
But in many cases a datetime string is necessary in addition to a version string to allow for determining the correct order of versions.

P.S.: I am not sure if @tschmidtb51's initial suggestion really was to solely use datetime for expressing any version, but I see it can be read that way. On first sight that does not make much sense for me, but this discussion brought me to believe a datetime should be associated with a version string and be used as a stronger indicator for ordering than the version string itself, at least when the version strings used to define a range bear any ambiguity.

Additionally, as @tschmidtb51 pointed out, in some cases a datetime may be all you have (e.g. ICs usually have their production week and year printed on them, in addition to the chip identifier, e.g. SN7400); changes in the chips internals may or may not be reflected in its identifier ("it is functionally equivalent, altering the identifier would only confuse customers": famous last words™ before a functional bug is discovered in the redesigned chip) and flaws introduced in a production run can be solely recognised based on datetime. Hence some cases exist for which datetime constitutes an appropriate (replacement for a) version string, because no other information is available.

Furthermore, @tschmidtb51 may push the boundaries of use-cases for this specification, but I understand well that in the realm of incident handling (not the world I dwell in) it is necessary to express "cloud service X was vulnerable from datetime to datetime".

@tschmidtb51
Copy link
Contributor Author

P.S.: I am not sure if @tschmidtb51 initial suggestion really was to solely use datetime for expressing any version, but I see it can be read that way.

To be precise: No. datetime is not the "one fit all" solution - neither is SemVer nor CalVer. It is just yet another versioning scheme that has its use cases (as all of them have). The addition should allow the creator of the vers to use the versioning scheme that fits best.

@immqu
Copy link

immqu commented Nov 18, 2024

@tschmidtb51 I am working on the implementation for univers: https://github.com/immqu/univers/tree/datetime
The RFC3339 format is a subset of the ISO8601 format. The latter one allows, e.g., to specify a timestamp like 2024-11-18 (without the exact time) while the former does not. Thus, I am using a regex now to validate the RFC3339 format, and then proceed to use an existing parser for the ISO format.

@fvsamson
Copy link

fvsamson commented Nov 26, 2024

@tschmidtb51 I am working on the implementation for univers: https://github.com/immqu/univers/tree/datetime-scheme

@immqu, that is nice, thank you. (BTW, I rectified the link to that specific branch.)

Thus, I am using a regex now to validate the RFC3339 format, and then proceed to use an existing parser for the ISO format.

May I ask what the ultimate goal is? I.e., must a datetime version strictly adhere to RFC3339, section 5.6, or is the idea to allow for any of the many ISO8601 formats? I strongly prefer confining datetime versions to RFC3339, section 5.6, because …

  • they are much easier to compare: fully supporting ISO8601 is hell, the formerly discussed use of time-zone information is trivial compared to that (still requires simple integer math, which is beyond the scope of a RegEx).
  • that likely does not require an external library.
  • there are cases, as the aforementioned "<year>/<week> is all info you have", in which some unambiguous mapping has to be employed, anyway. Suggestion: Specify to use the start of the corresponding day, week, month, year etc.
  • ISO standards are not a publicly available (at least officially: One must pay for obtaining them), in contrast to IETF RFCs.

P.S.:
Something that still has to be fleshed out, if my suggestion to recommend supplying a datetime version in addition to version information utilising other version schemes. I provided some examples why I believe this to be necessary in my original message here and my reply to @matt-phylum.

If it is deemed helpful to separate this sub-topic from the principal topic of this issue (i.e. specifying a datetime versioning scheme), I will open a new issue, if instructed so.

@fvsamson
Copy link

fvsamson commented Nov 26, 2024

BTW, @matt-phylum, @immqu, @prabhu, @pombredanne et al, in hindsight I feel that I should have provided a little context why I suddenly appeared here writing lengthy comments: @tschmidtb51 notified me, "This might be something of interest for you.", so I reviewed this discussion (and lastly also PR #139 / issue #328) and became a carried away by spontaneously commenting, because this actually is quite interesting for me in the context of BSI TR-03183-2 "SBOM".

Thank you all for your excellent work! My aim is not to criticise, to rant or to lecture, but to help making this valuable work even better.

@immqu
Copy link

immqu commented Nov 28, 2024

May I ask what the ultimate goal is? I.e., must a datetime version strictly adhere to RFC3339, section 5.6, or is the idea to allow for any of the many ISO8601 formats? I strongly prefer confining datetime versions to RFC3339, section 5.6, because …

I didn't find useful libraries for working with RFC3339 timestamps in Python. Thus, my idea was to verify that the provided string does in fact conform to RFC3339 first, and if yes, proceed with an established parser that uses the ISO standard (dateutil). Currently, I am not sure how much effort it would be to re-implement a parser and comparison logic from scratch...

@sjn
Copy link

sjn commented Dec 6, 2024

FYI: https://calver.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants