Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] External Data Support #3191

Open
1 task done
blink1073 opened this issue Mar 24, 2022 · 7 comments
Open
1 task done

[FR] External Data Support #3191

blink1073 opened this issue Mar 24, 2022 · 7 comments

Comments

@blink1073
Copy link
Contributor

What's the problem this feature will solve?

Project Jupyter uses data_files for discovery of runtime extensions. This feature was deprecated without a viable alternative.

Describe the solution you'd like

The flit project recently added support for external_data, which is a simplified and constrained version of data_files. We could support an equivalent feature in setuptools, with caveats about when it is appropriate to use it.

Alternative Solutions

Project Jupyter and other current consumers of data_files could instead switch to using a flit backend, which we have been actively exploring.

Additional context

Project Jupyter uses data_files to allow libraries to provide shared setting and static assets files that are consumed by Jupyter at runtime. We have explored using entry_points for this purpose, but it did not scale well.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@blink1073 blink1073 added enhancement Needs Triage Issues that need to be evaluated for severity and status. labels Mar 24, 2022
@abravalheri
Copy link
Contributor

Hi @blink1073 thank you very much for putting up this issue.
I think this request is very reasonable. Is there any other requirements for the jupyter use case?
If I understood correctly the idea is to pick-up a directory and then replicate the file tree under it in the appropriate installation schema, right?

Regarding the "deprecated" status of data_files, my personal opinion is that "it's complicated"...

The behaviour of installing files to arbitrary locations as it used to do in the distutils and then in the easy_install era is definitely deprecated. But I don't see the feature itself disappearing in the future (just that users have to understand its limitations and know exactly when to use it).

Other maintainers might disagree, but I do appreciate that there are genuine use cases for it (like jupyter).

@abravalheri abravalheri added help wanted proposal and removed Needs Triage Issues that need to be evaluated for severity and status. labels Mar 24, 2022
@blink1073
Copy link
Contributor Author

Jupyter looks in <sys_prefix>/share/jupyter and <sys_prefix>/etc/jupyter for its files. As long as we can target those two directories we are fine.

I think the current semantics of data_files are what is complicated: it can target absolute paths, and it is very tricky to use for nested files. We use a helper function to make it glob-friendly.

I think with a new API we can offer a simpler, safer alternative and eventually remove data_files support altogether.

@blink1073
Copy link
Contributor Author

cc @minrk @jasongrout @bollwyvl

@blink1073
Copy link
Contributor Author

blink1073 commented Mar 24, 2022

If there's buy-in I can take a crack at implementation.

@abravalheri
Copy link
Contributor

I am sold on that.

It would be nice to hear the opinion of @jaraco.


For more information, there is a in-depth discussion about why jupyter needs data dirs in the discourse, this is also relevant.

If I understood correctly jupyter needs to be able to share/read/use files from extensions. These files need to be found in the disk in a reliable way, even if the package providing them has not been imported yet. The performance of searching for entry-points on disk also seems to be an issue for jypyter.

@ofek
Copy link
Contributor

ofek commented May 6, 2022

@blink1073 Hey! I saw this linked on your slides from the packaging summit.

Hatchling supports a shared-data option for wheels. Would that satisfy Jupyter's use case?

@blink1073
Copy link
Contributor Author

Hi @ofek, thanks for pointing that out! The consensus at the summit was that I should draft a PEP to standardize the handling of the data and headers directories for build backends. I will use flit and hatch as existing implementation examples.

As for Jupyter, we could have used flit already (in fact I had a WIP), but many of the consumers of jupyter-packaging are using setuptools-specific mechanisms and extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants