Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoupling interface versioning and Tables.jl API versioning? #133

Open
tkf opened this issue Feb 5, 2020 · 5 comments
Open

Decoupling interface versioning and Tables.jl API versioning? #133

tkf opened this issue Feb 5, 2020 · 5 comments

Comments

@tkf
Copy link
Contributor

tkf commented Feb 5, 2020

It might be nice to have a way to decouple Tables.jl implementation and interface versions. Packages providing generic table processing functions may want to impose availability of certain table interfaces and/or availability of certain utility functions ("secondary" APIs) provided by Tables.jl. Being able to declare these different notions of compatibility separately makes sense. I think this makes growing Tables.jl interface easier and safer.

I think one way to do this might be to prepare an empty package (say) TablesAPI and a single function Tables.test(mytable) (ref #131 (comment)). Alternatively, the package TablesAPI may contain empty/trivial function definitions, like DataAPI. Packages with custom table types (provider packages) can then declare its interface-compatibility by

name = "ProviderPackage"

[compat]
Tables = "1.0.2"
TablesAPI = "< 1.2"

while invoking something like

Tables.test(ProviderPackage, create_table_example())

from their test suite. This way, interface version compatibility would be enforced by the test suite.

(Detail: since the dependencies of ProviderPackage may upper-bound the version of TablesAPI, we need to make sure that TablesAPI used in the test is the latest version that Project.toml of ProviderPackage claims. This is why the module ProviderPackage is passed as the argument of Tables.test. This function then can parse Project.toml of ProviderPackage and TablesAPI and check the versions.)

On the other hand, packages that consumes table interfaces declare usual lower bound for TablesAPI:

name = "ConsumerPackage"

[compat]
Tables = "1.1.3"
TablesAPI = "1.1"

This way, users can safely assume that all packages within an environment satisfy given Tables interface.

To clarify the difference:

  • TablesAPI.jl patch version is never touched.
  • TablesAPI.jl minor version is incremented when a non-breaking mandatory Tables interface is added.
  • TablesAPI.jl major version is incremented when there is a breaking change in Tables interface.
  • Tables.jl patch version is incremented when a bug in Tables.jl API/utility functions is fixed.
  • Tables.jl minor version is incremented when a non-breaking improvement in Tables API/utility functions is added. This does not necessary mean that there are new Tables interface; e.g., adding Tables.rows support for more types in Base.
  • Tables.jl major version is incremented when there is a breaking change.

Of course, this scheme could be too strict. You may be using the part of ConsumerPackage that requires only TablesAPI v1.0. In that case, it'd be annoying if YetAnotherProviderPackage declares TablesAPI = "< 1.1". However, my guess is that this is not a problem because I'd imagine TablesAPI is updated very slowly (maybe less than once a year).

@quinnj
Copy link
Member

quinnj commented Feb 6, 2020

Hmmmm........at first glance, I'm worried about the complexity here. For example, most provider packages are also consumer packages, i.e. they provide a table source as well as a sink function to take any tables source and "consume" the table into its own format, so it's not entirely clear that the strategy would still work for them.

I think I'm also failing to see the motivation here, or what problem this solves? It seems like regular sem-versioning does the job; if we make breaking or mandatory new API changes, then we bump the major version, and everything continues to work until packages upgrade and support the new stuff (like we're currently planning for the 1.0 release).

@tkf
Copy link
Contributor Author

tkf commented Feb 6, 2020

The motivation is for automating interface compatibility test. This, in turn, requires a way to find out what interface is supported by a given type. For example, let's say you'd want to add a sink-oriented API at v1.1

Tables.into(DestType::Type, src) => DestType

You can then create an automated test like this:

module Table
...
function test(m::Module, table)
    ...
    if api_version_of(m) >= v"1.1"
        @testset "into" begin
            DestType = typeof(table)
            for src in sample_tables_for_test()
                dest = into(DestType, src)
                @test dest isa DestType
                @test collect(rows(dest)) == collect(rows(src))
            end
        end
    end
    ...
end
...
end

However, unless Tables.jl versioning is decoupled from the "interface versioning" (i.e., without if api_version_of(m) >= v"1.1"), this will break the provider packages even if they are semver compatible with Tables.jl.

@quinnj
Copy link
Member

quinnj commented Feb 7, 2020

Here's how I would imagine a setup just relying on semver working:

  • Tables.jl: 1.0
  • TablesTests.jl: new package for testing Tables.jl implementations: 1.0
  • CSV.jl 1.0: declares Tables = "~1.0" and TablesTests = "~1.0", calls TablesTests.test(CSV.File, CSV.write) (it provides a source and sink function)

Then hypothetical Tables.into is introduced, so we have:

  • Tables.jl: 1.1, bumped minor version for new functionality, but non-breaking
  • TablesTests.jl: also bumps to 1.1, and adds a new test for Tables.into for implementation testing
  • CSV.jl: still has Tables = "~1.0" and TablesTest = "~1.0", which means it still requires 1.0 explicitly and won't auto-upgrade to 1.1, no compat issues

CSV.jl reviews the new API and implements Tables.into:

  • Tables.jl: still 1.1
  • TablesTests.jl: still 1.1
  • CSV.jl: makes a new release, bumping compat to Tables = "~1.1" and TablesTests = "~1.1"

Is there a bad case in these scenarios? Does it cause problems at all? In my mind, TablesTests.jl has always been a separate package that would keep up/compat with Tables.jl.

@quinnj
Copy link
Member

quinnj commented Feb 7, 2020

An alternative strategy that wouldn't require packages with Tables.j dependencies to have compat like "~1.0", would be to only introduce new interface API changes in major release versions, which seems wise. So CSV.jl could declare Tables = "1", and things would always work for any 1.1, 1.2, 1.3 Tables releases. Then, if Tables.jl introduces Tables.into, it would bump version to 2.0, and packages would automatically be capped at [1.0, 2.0) until they explicitly upgraded.

This is very good discussion to have now though as we prepare for a 1.0 release so it's very clear our policy/strategy going forward. I'll make an action item from this issue to do a writeup in the docs about the Tables.jl release policy/process.

@tkf
Copy link
Contributor Author

tkf commented Feb 8, 2020

I think the most important aspect of SemVer is backward compatibility. I think using ~1.0 (i.e., compatibility with [1.0.0, 1.1.0) but not with [1.1.0, 2.0.0)) defeats this purpose. This would mean you can't install (say) DataFrames.jl with Tables = "~1.0" and CSVFile.jl with Tables = "1.1" (even without ~ here) in a same environment. So I'd suggest to not recommend using ~ for the packages downstream to Tables.jl (unless they depend on Tables.jl internal).

(And I think this is why we need a bit of trick here. Provider packages have to talk about (a lack of) forward compatibility.)

An alternative strategy that wouldn't require packages with Tables.j dependencies to have compat like "~1.0", would be to only introduce new interface API changes in major release versions, which seems wise.

I agree this is a good strategy that requires less hustle, provided that the interface are going to be changed very slowly. Having said that, I think the disadvantages of the major version approach compared to TablesAPI.jl are:

  • Now the consumer packages are responsible for figuring out the interface usage. They have to put Tables = "1, 2" if they don't use the hypothetical Tables.into interface that is added in Tables.jl 2.0.
  • It is not possible to break the "secondary" API (i.e., the functions defined on top of the "basic" Tables.jl interface) without sending a possible false alarm to provider packages.

My assumption is that adding new interfaces would be more frequent than breaking existing APIs and consumer packages are the majority. That's why I thought it makes sense to decouple the "basic" interface versioning from Tables.jl (major) versioning. This way, consumer packages do not receive false alarms via the major bumps of Tables.jl that might indicate inclusion of the new "basic" interface (that they couldn't possibly have been using)

In my mind, TablesTests.jl has always been a separate package that would keep up/compat with Tables.jl.

I agree this is a good approach. I wanted to avoid introducing more concepts in the OP so I was using Tables.test as an entry point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants