Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't emit warning for explicitly included data files #4789

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cdce8p
Copy link
Contributor

@cdce8p cdce8p commented Jan 4, 2025

Summary of changes

From what I can tell based on the discussion in #3340, it would make sense to hide the Package would be ignored warning for explicitly including data files (via tool.setuptools.package-data or similar).

@cdce8p cdce8p force-pushed the data-files-warning branch from 00f7524 to 4e8cadd Compare January 4, 2025 12:51
@abravalheri
Copy link
Contributor

abravalheri commented Jan 8, 2025

I have a different understanding of the previous discussion: if the package is not included, then the data files contained by it cannot be included (adding the package itself is a pre-requisite for adding its data files)...

So the warning is a call to action for the users to add the package name in the package list (this would be along the lines of "explicit is better than implicit" and raising awareness of the users that for Python purposes the directories are packages during runtime and can even be imported).

@cdce8p
Copy link
Contributor Author

cdce8p commented Jan 9, 2025

I have a different understanding of the previous discussion: if the package is not included, then the data files contained by it cannot be included (adding the package itself is a pre-requisite for adding its data files)...

My specific example which motivated the PR is Mypy. It uses packages.find.include with a wildcard ["mypy*", ...] and additionally package-data. The paths used package-data are subfolders as it would clutter the main folder too much otherwise. The warning in this case is pointless IMO.

https://github.com/python/mypy/blob/v1.14.1/pyproject.toml#L79-L92

So the warning is a call to action for the users to add the package name in the package list (this would be along the lines of "explicit is better than implicit" and raising awareness of the users that for Python purposes the directories are packages during runtime and can even be imported).

My goal here would be to think of a better user story, ie. what can you do if you see the warning, are aware of it but still want to go ahead with it? Namespace packages aren't really an option. Some alternative ideas

  • If explicit is better them implicit, would it make sense to add a new option, maybe tool.setuptools.data-folder where users can explicitly name folder (globs) which should opt-out of this check?
  • As the warning is quite annoying, I even thought about overwriting build_py. _filter_build_files for mypy just to get rid of it. Maybe we could instead refactor analyze_manifest so it wouldn't be necessary to overwrite a private method.
    check = _IncludePackageDataAbuse()
    for path in self._filter_build_files(files, egg_info_dir):

@abravalheri
Copy link
Contributor

Thanks for the context @cdce8p that is really helpful.

I guess the point that the warning is complaining is that you want to include data files for the following packages:

mypy.typeshed
mypy.typeshed.stdlib
mypy.typeshed.stdlib._typeshed
mypy.typeshed.stdlib.asyncio
mypy.typeshed.stdlib.collections
mypy.typeshed.stdlib.concurrent
mypy.typeshed.stdlib.concurrent.futures
mypy.typeshed.stdlib.ctypes
mypy.typeshed.stdlib.ctypes.macholib
mypy.typeshed.stdlib.curses
mypy.typeshed.stdlib.dbm
mypy.typeshed.stdlib.distutils
mypy.typeshed.stdlib.distutils.command
mypy.typeshed.stdlib.email
mypy.typeshed.stdlib.email.mime
mypy.typeshed.stdlib.encodings
mypy.typeshed.stdlib.ensurepip
mypy.typeshed.stdlib.html
mypy.typeshed.stdlib.http
mypy.typeshed.stdlib.importlib
mypy.typeshed.stdlib.importlib.metadata
mypy.typeshed.stdlib.importlib.resources
mypy.typeshed.stdlib.json
mypy.typeshed.stdlib.lib2to3
mypy.typeshed.stdlib.lib2to3.fixes
mypy.typeshed.stdlib.lib2to3.pgen2
mypy.typeshed.stdlib.logging
mypy.typeshed.stdlib.msilib
mypy.typeshed.stdlib.multiprocessing
mypy.typeshed.stdlib.multiprocessing.dummy
mypy.typeshed.stdlib.os
mypy.typeshed.stdlib.pydoc_data
mypy.typeshed.stdlib.pyexpat
mypy.typeshed.stdlib.sqlite3
mypy.typeshed.stdlib.sys
mypy.typeshed.stdlib.tkinter
mypy.typeshed.stdlib.unittest
mypy.typeshed.stdlib.urllib
mypy.typeshed.stdlib.venv
mypy.typeshed.stdlib.wsgiref
mypy.typeshed.stdlib.xml
mypy.typeshed.stdlib.xml.dom
mypy.typeshed.stdlib.xml.etree
mypy.typeshed.stdlib.xml.parsers
mypy.typeshed.stdlib.xml.parsers.expat
mypy.typeshed.stdlib.xml.sax
mypy.typeshed.stdlib.xmlrpc
mypy.typeshed.stdlib.zipfile
mypy.typeshed.stdlib.zipfile._path
mypy.typeshed.stdlib.zoneinfo
mypy.typeshed.stubs
mypy.typeshed.stubs.mypy-extensions
mypy.xml

But the configuration that you currently have is telling setuptools to not include these packages in the final wheel (they are absent in the packages configuration). So the configuration is ambiguous to setuptools.

Would something similar to the following make sense (untested, please double check):

[tool.setuptools.packages.find]
include = ["mypy*", "mypyc*", "*__mypyc*"]
exclude = [
  "mypyc.external*",
  "mypyc.lib-rt*",
  "mypyc.test-*",   # Maybe this one should be `mypyc.test*`, but apparently the current configuration is distributing `mypyc.test`
  "mypyc.doc*"
]
# implicit `namespaces = true`

In the end of the day namespaces = false works like a exclude = [...] that automatically excludes all the packages without __init__.py. However, it seems that you want to include some of the packages that don't have __init__.py but not all of them. So the first solution that comes to my mind here is to explicitly pass an exclude list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants