-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.gitignore consistency with Git #526
Comments
I believe most of these are implemented - assuming you pass in relative paths to the directory the gitignore file was found in, and add a trailing slash to the path if your checking for a directory. I just followed the syntax described in the URL you mention.
Do you have specific cases that work differently with the Dulwich parser vs C git?
The documentation suggests that you can't redo an ignore that was previously excluded.
Segev Finer <[email protected]> schreef op 4 juli 2017 18:55:38 GMT+01:00:
…Getting gitignore behavior just right seems to be quite hard. (I think
even libgit2 still has some issues with it 😛) This are some things
that I encountered which are different in the current implementation in
master:
[ ] A leading slash should anchor the pattern to the directory that the
.gitignore file is in.
[ ] A pattern including a trailing slash should match only directories.
That slash should not be considered in deciding whether pattern that
matches a filename or a path. Note that if a parent directory is
ignored all files in it are ignored too.
[ ] If a pattern matches a parent directory of a file. That file should
also be ignored. (Unless a further pattern re-includes all parents).
[ ] I think that if you ignore a file, re-include it (A pattern with a
'!') and than ignore it again. It should be ignored. (The code
currently does `return`).
[ ] The behavior in regards to ignoring directories is quite finicky.
Whether a directory is ignored or not affects whether a pattern that
could affect that directory takes effect.
Reference: https://git-scm.com/docs/gitignore
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#526
|
Note that I edited the issue.
I probably have used this module incorrectly. I will try later to use it
with os.walk to properly skip ignored subdirectories, and add the slash to
the end of directories. And if it seems to work correctly, I will mark it
as such.
I'm still unsure about the negation thing. The documentation mentions that
negation affects *previous* patterns. So it stands to reason that further
ignore pattern might still override the negation yet again. Probably need
to do a check with standard Git to see what the real behavior is.
בתאריך 4 ביול' 2017 23:11, "Jelmer Vernooij" <[email protected]> כתב:
… I believe most of these are implemented - assuming you pass in relative
paths to the directory the gitignore file was found in, and add a trailing
slash to the path if your checking for a directory. I just followed the
syntax described in the URL you mention.
Do you have specific cases that work differently with the Dulwich parser
vs C git?
The documentation suggests that you can't redo an ignore that was
previously excluded.
Segev Finer ***@***.***> schreef op 4 juli 2017 18:55:38
GMT+01:00:
>Getting gitignore behavior just right seems to be quite hard. (I think
>even libgit2 still has some issues with it 😛) This are some things
>that I encountered which are different in the current implementation in
>master:
>[ ] A leading slash should anchor the pattern to the directory that the
>.gitignore file is in.
>[ ] A pattern including a trailing slash should match only directories.
>That slash should not be considered in deciding whether pattern that
>matches a filename or a path. Note that if a parent directory is
>ignored all files in it are ignored too.
>[ ] If a pattern matches a parent directory of a file. That file should
>also be ignored. (Unless a further pattern re-includes all parents).
>[ ] I think that if you ignore a file, re-include it (A pattern with a
>'!') and than ignore it again. It should be ignored. (The code
>currently does `return`).
>[ ] The behavior in regards to ignoring directories is quite finicky.
>Whether a directory is ignored or not affects whether a pattern that
>could affect that directory takes effect.
>
>Reference: https://git-scm.com/docs/gitignore
>
>--
>You are receiving this because you are subscribed to this thread.
>Reply to this email directly or view it on GitHub:
>#526
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#526 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AXlg_0oGSfHjKoK9bKpgUa1cO6HDXRHtks5sKpx_gaJpZM4ONl01>
.
|
A pattern without a trailing slash should match both files and directories: git init temp && cd temp
mkdir -p foo/bar
cat > .gitignore <<EOF
bar
EOF
git check-ignore -v foo/bar Output:
Dulwich: >>> from dulwich import ignore
>>> with open('.gitignore', 'r') as f:
... a = ignore.IgnoreFilter(ignore.read_ignore_patterns(f))
...
>>> a.is_ignored('foo/bar/')
# None |
There probably needs to be a way to properly handle .gitignore files from subdirectories. Their patterns should only apply below them and path patterns from parent directories should be anchored to the directory where they are defined. It's unclear how to currently achieve this if at all possible with the current implementation. |
I tried running the following script against PR #530 and discovered that Git ignores case on Windows or other operating systems with case insensitive file systems when matching gitignore patterns. Everything else seems to be ignored correctly, even stuff that libgit2 seems to trip on 😃. We can probably try this script on other repositories, and once there is support for .gitignore files in subdirectories, trying a similar script that will support them too, on more complicated repositories, should be a good way to validate that we got the implementation right. The script is a bit slow since it calls from __future__ import print_function
import sys
import os
import argparse
import subprocess
from dulwich import ignore
def to_git_path(repo, p, dir_=False):
git_path = os.path.relpath(p, repo).replace('\\', '/')
if dir_:
git_path += '/'
return git_path
def main():
parser = argparse.ArgumentParser()
parser.add_argument("repo")
args = parser.parse_args()
with open(os.path.join(args.repo, '.gitignore'), 'r') as f:
ignore_filter = ignore.IgnoreFilter(ignore.read_ignore_patterns(f))
ignore_filter.append_pattern('.git')
for root, dirs, files in os.walk(args.repo):
for name in files:
git_path = to_git_path(args.repo, os.path.join(root, name))
dulwich_ignored = bool(ignore_filter.is_ignored(git_path))
git_ignored = subprocess.call(
["git", "-C", args.repo, "check-ignore", "--no-index", git_path],
stdout=subprocess.PIPE) == 0
if dulwich_ignored ^ git_ignored:
print("Diff:", git_path, dulwich_ignored, git_ignored)
dirs[:] = [i for i in dirs if not ignore_filter.is_ignored(
to_git_path(args.repo, os.path.join(root, i), True))]
if __name__ == "__main__":
sys.exit(main()) |
Another test that might trip Dulwich (when implemented): git init temp && cd temp
mkdir a
touch a/b
cat > .gitignore <<EOF
/a/
EOF
cat > a/.gitignore <<EOF
!b
EOF
git check-ignore -v a/b Output:
|
I believe this last case should now be handled correctly, too. |
def _posix_path(p):
"""Convert to a path to POSIX style."""
return p.replace('\\', '/') And use it in all |
I've added support for core.ignorecase. |
I've also fixed the issue with directory paths in IgnoreFilterManager. Please let me know if you're aware of any more pending issues. |
This two lines: dulwich/ignore.py:298-299, might trash a trailing slash in the passed path when it's absolute. |
Here is a version of the script from before using from __future__ import print_function
import sys
import os
import argparse
import subprocess
from dulwich.repo import Repo
from dulwich import ignore
def to_git_path(repo, p, dir_=False):
git_path = os.path.relpath(p, repo).replace('\\', '/')
if dir_:
git_path += '/'
return git_path
def diff(repo, ignore_filter, git_path):
dulwich_ignored = bool(ignore_filter.is_ignored(git_path))
git_ignored = subprocess.call(
["git", "-C", repo, "check-ignore", "--no-index", git_path],
stdout=subprocess.PIPE) == 0
if dulwich_ignored ^ git_ignored:
print("Diff:", git_path, dulwich_ignored, git_ignored)
def main():
parser = argparse.ArgumentParser()
parser.add_argument("repo")
args = parser.parse_args()
with Repo(args.repo) as repo:
ignore_filter = ignore.IgnoreFilterManager.from_repo(repo)
for root, dirs, files in os.walk(args.repo):
for name in files:
git_path = to_git_path(args.repo, os.path.join(root, name))
diff(args.repo, ignore_filter, git_path)
for name in dirs:
git_path = to_git_path(args.repo, os.path.join(root, name), True)
diff(args.repo, ignore_filter, git_path)
if __name__ == "__main__":
sys.exit(main()) Using it on CPython's repository I caught this one: git init temp && cd temp
mkdir -p a/b/c
cat > .gitignore <<EOF
a/b/*
EOF
git check-ignore -v a/b/ Output:
Dulwich: >>> from dulwich.repo import Repo
>>> from dulwich import ignore
>>> a=Repo('.')
>>> b=ignore.IgnoreFilterManager.from_repo(a)
>>> b.is_ignored('a/b/')
# None EDIT: I think this specific case is not an issue though. There is this specific example in the gitignore man page: $ cat .gitignore
# exclude everything except directory foo/bar
/*
!/foo
/foo/*
!/foo/bar Which suggests that git check-ignore -v a/b Does show that the directory isn't ignored. I guess I can modify the script to not pass the trailing slash to |
@jelmer I edited #526 (comment) a bit. Hopefully you noticed 😛. Just be sure not to accidentally regress this case from the gitignore man page when you change $ cat .gitignore
# exclude everything except directory foo/bar
/*
!/foo
/foo/*
!/foo/bar |
I've added a specific testcase for the example from the manpage you mentioned. It seems to pass :) |
I've "fixed" the absolute path issue by requiring that callers pass in a relative paths. |
@segevfiner are you aware of any more issues? If not, I'll mark this closed and will do a release in a couple of days :) |
I have ran my script (with git init temp && cd temp
mkdir spam
touch spam/ham
cat > .gitignore <<EOF
/spam/*
EOF
git check-ignore -v spam spam/ham Output
Dulwich: >>> from dulwich.repo import Repo
>>> from dulwich import ignore
>>> a=Repo('.')
>>> b=ignore.IgnoreFilterManager.from_repo(a)
>>> b.is_ignored('spam/')
True
>>> b.is_ignored('spam/ham.txt')
True It feels like it would be easy to regress something else by accident while fixing this, be careful. It's confusing, and I'm not sure myself exactly how Git handles those kind of patterns so that they don't match the directory itself. The code for this is in (I have also seen projects do |
morgaine:/tmp/foo% cat .gitignore morgaine:/tmp/foo% dulwich check-ignore b So git is behaving differently depending on whether a slash is passed in, like Dulwich. |
That's probably just Here is an example that shows where there is a difference due to this: git init temp && cd temp
mkdir a
touch a/b.txt a/c.dat
cat > .gitignore <<EOF
a/*
!a/*.txt
EOF
git check-ignore -v a a/b.txt a/c.dat Output:
(The directory "a" is not ignored, "a/b.txt" is not ignored, and "a/c.dat" is ignored) Dulwich: >>> from dulwich import ignore
>>> from dulwich.repo import Repo
>>> a=Repo('.')
>>> b=ignore.IgnoreFilterManager.from_repo(a)
>>> b.is_ignored('a/')
True
>>> b.is_ignored('a/b.txt')
True
>>> b.is_ignored('a/c.dat')
True |
Ahh, thanks - that makes sense. I'll do some more digging and see if I can resolve this specific case. |
We should be able to prevent regressions, so long as we add test cases for all the corner cases we identify. I've made a change to fix the last issue you mentioned. |
I didn't find anything else using the script. 😄 (Barring the fact that |
Fixed the exit code in master. I'll close this bug report, we can always fix other cases later if we notice them. |
@jelmer Actually I was speaking about the exit code of the standard Git The code you added to Dulwich actually seems to do what you expect only returning 0 when a passed path is really ignored. 😃 |
Getting gitignore behavior just right seems to be quite hard. (I think even libgit2 still has some issues with it 😛) This are some things that I encountered which are different in the current implementation in master:
return
). – PR A pattern with only a trailing slash should be treated as a glob #529Git enables case insensitivity based on the
core.ignorecase
config.IgnoredFilterManager.is_ignored
.Reference: https://git-scm.com/docs/gitignore
The text was updated successfully, but these errors were encountered: