-
-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-32 missing encoding for ansible.git @ salsa #1798
Comments
I can reproduce this on Arch Linux with The error occurs in gitoxide/gix-filter/src/pipeline/convert.rs Lines 15 to 16 in 97f85f7
The problem is that gitoxide/gix-filter/src/pipeline/util.rs Lines 46 to 50 in 97f85f7
By design, //! The UTF-32 family of Unicode encoding schemes is not supported
//! by this crate. The Encoding Standard doesn't define any UTF-32
//! family encodings, since they aren't necessary for consuming Web
//! content.
I don't know what the best fix would be. |
Thanks for reporting, and thanks for doing a first analysis, @EliahKagan! My intuition here would be to see what Git does if it encounters an unknown encoding. My guess would be that instead of failing everything, it will continue to do as much as possible, and report failed files in the end similar to how case-folded files are reported. Doing so would make Edit: Maybe on top of that, it would and should be possible to specifically handle UTF32, there is a |
In my experiment (#1798 (comment)),
That sounds like it could be a good fix. Are there any disadvantages to having it as a dependency? |
Right! My thought was to find out what Git does if it encounters an unknown encoding. Does it just fail, or continue with as much as it can handle?
Probably not, even though I would probably gate additional encodings and their dependencies behind feature flags. Otherwise the amount of additional dependencies needed to fully support everything that Git can handle is an unknown and possibly large number. |
This is not decisive--maybe In contrast, This is to say that a repository that triggers this bug can be produced as follows: git init has-utf32-encoding
cd has-utf32-encoding
echo 'a text working-tree-encoding=UTF-32' >.gitattributes
touch a
git add .
git commit -m 'Initial commit'
cd ..
git clone has-utf32-encoding hue # Verify that git silently works.
rm -rf hue
gix clone has-utf32-encoding hue The last command produces:
Likewise, with an encoding pretty much guaranteed not to be recognized: git init has-unrecognized-encoding
cd has-unrecognized-encoding
echo 'a text working-tree-encoding=wait-a-minute-this-is-not-a-real-encoding' >.gitattributes
touch a
git add .
git commit -m 'Initial commit'
cd ..
git clone has-unrecognized-encoding hue # Verify that git silently works, even with this.
rm -rf hue
gix clone has-unrecognized-encoding hue The last command produces:
The effects in both So if one does not want to make a repository, one can use the repository presented in the description. But to avoid waiting for it to clone (since the fetch part has no errors, just the checkout), or to check with an encoding that truly does not exist, those test repositories can be used. (I may at some point move my test repositories on GitHub into an organization that exists to hold them and distinguish them from others. But links with them under Edit: In the https://salsa.debian.org/python-team/packages/ansible.git repository, it looks like the *.text text eol=LF
*.txt text eol=CRLF
*.txt16 text working-tree-encoding=UTF-16 eol=CRLF
*.txt32 text working-tree-encoding=UTF-32 eol=CRLF (It also seems to have a blank line at the end, but I do not think that is relevant to this issue.) There are two files in that directory that match |
Thanks a lot for researching this!
To me this looks like Git is silently ignoring unknown encodings, and I think that |
Would it be a bug not to fail to check out files for which an unrecognized encoding is specified, even if no line-ending normalization or other transformation would require knowledge of the encoding? Or should that be allowed and other commands ( Another thing I am unclear on is how stringently encodings need to be respected as applied to I am not familiar enough with the semantics of encodings in Git and gitoxide to know if these questions point to any actual difficulties (other than difficulties from my own limited knowledge of this topic). |
I would also think that the filter should be failing as usual, but that certain failure types should be ignored where needed. So the checkout would probably ignore the failure, check out what's there, and report it separately, and probably like Git check-out what's stored in Git instead.
That's an interesting thought, I never thought about it! Right now the top-level file would control the subordinate file encoding. A |
Current behavior 😯
Hi.
gix clone on https://salsa.debian.org/python-team/packages/ansible.git fails with the following error:
Expected behavior 🤔
It should succeed in cloning.
Git behavior
git clone succeeds.
Steps to reproduce 🕹
gix clone https://salsa.debian.org/python-team/packages/ansible.git
Failure with error message as described and error level 1 in the shell.
The text was updated successfully, but these errors were encountered: