-
-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite parsing logic of gix-url #990
Conversation
Makes sure to always include the expected and (if available) actual URL in the panic message. This makes reasoning of failed baseline test much easier.
This change does not affect the number of passing tests, because all tests that were affected by this wrong assertion now fail for some other reason.
Doubles the amount of baseline tests that pass :)
This change is intentionally kept very simple and does not try to immediately fix all now failing tests. When comparing the test output of this implementation with the previous there are a few things worth mentioning: - all previously passing baseline tests still do so - the new implementation correctly classifies paths starting with a `/` or `./` but containing a `:` as file paths - the new implementation correctly classifies paths containing just letters before the first `:` as scp paths - the new implementation fails all tests that check the returned errors for invalid paths - the new implementation seems to fail for some windows paths The next step is to evaluate the now failing tests, and either remove/rewrite them if they are based on flawed assumptions (looking at you 'ssh_alias_needs_username_to_not_be_considered_a_filepath') or fix the new implementation to make them pass again. We can then work on making the baseline tests pass. The change also keeps the previous parsing logic around and just removes it from the module graph, to allow for local comparisons of the two implementations during development.
0be1843
to
43bd55e
Compare
The `MissingResourceLocation` error variant was previously returned if the input URL had either scheme ssh (no matter if URL or SCP format) or git and the URL contained no or an empty path section. In the future we will instead return the `MissingRepositoryPath` error in such a case. The only benefit of such a seperation is that it allows the caller to infer the kind of URL we tried to parse. A future commit will add a new field to the `MissingRepositoryPath` error variant which will clearly communicate this information and can therefore be used instead for this purpose.
The `NotALocalFile` error variant was previously returned if the input url contained a colon but the slice after that character contained no slash character (`/` or `\`). This behavior is wrong. SCP like URLs may not need a slash character to specify the repositories location. Similarly, a local directory that contains a colon in its name is a valid repository path as well. The new implementation correctly parses such URLs.
All error variants now own a copy of the input which is used in their display implementation. I noticed that URL parsing errors are almost never acted upon but only wrapped and bubbled up to the user. It therefore makes sense to ensure the message for this error is informative enough by default. If an error can be thrown for different types of URLs it now also includes a kind field. With this field the caller can determine what kind of URL we tried to parse. Additionally, this information is used for the error message too. For testing the assert_matches crate was added which implements the unstable feature with the same name in stable rust. It makes testing the invalid variants much more convenient. BREAKING because public fields of some error variants are now no longer available
f0a0d2d
to
38db0c1
Compare
Is there anything I can do to help with this PR? |
I am sorry for the long silence, please know that I am still interested in this. And now that I have finally finished the Netflix series I have been spending my free time on for the past few weeks, I am confident that I will be finishing this PR soon :) |
The behavior of the nextest command stays the same by default but allows the user to specify the flags.
8dcfd16
to
673946a
Compare
673946a
to
3ab7719
Compare
It will probably take me another few days to finish Windows, especially because using a VM and everything slows down my workflow. But we will see how much needs fixing |
If the work with windows is too aggravating, I can also do it for you - with my setup it's still quite terrible, but for |
No worries, I think my workflow will be quite workable - I just need to set it up first, which I am doing right now. If I have any problems I will let you know! |
I think the previous behavior was actually the correct one for Windows systems, but because my main dev machine is Linux, it is easier to make sure everything works correctly on those platforms first. I will fix Windows next.
3ab7719
to
6c46ad6
Compare
Do you want me to just rebase or to reserve the history/commit dates and merge main into this branch. I am happy to do either. |
You can do what's best for you, or what you see fit, I don't have a preference. |
For more context see d6f90be. We however no longer implement the backslash to forward slash conversion. There is no good reason for why Git does that conversion and we do not need it to stay compatible (Git can handle both versions just fine). Additionally, the logic for the conversion was very involved and to implement it proper we would need to detect the shell we are executing in (Git performs the conversion only in bash, not in cmd/ps).
Git parses that as UNC url, but we do not yet implement UNC paths on Windows.
19771d7
to
b638b06
Compare
Everything done now! I can write the changelog entry if you want - I probably have a much easier time summarizing the changes having worked on this for so long :) You would need to tell me how to do that correctly with cargo-smart-release though |
Great, I'd appreciate it! I think the subject should be The body can contain whatever markdown you want for further explanation. This would have to go to at least one commit to be picked up by Please feel free to rewrite commits as you see fit. |
I only focused on public API changes when marking my commits for inclusion in the changelog and explicitly left out any behavior changes that only affect what URLs we parse and how we parse them. My intend was to just write something along the lines of the following at the top of the changelog.
Idk about the wording just did something quick. Or do you think we need to be more precise? |
b638b06
to
49b278a
Compare
Since the merge of
In theory, one would mark each feature or change as they happen to the respective crate by using conventional commit messages, exactly like you did. Adding a top-level paragraph in In the meantime, I will start reviewing and will probably push a few changes to this branch as well, so let's not force push anymore :). |
Oh, I realized that now after the reset, the changelogs are outdated. Then I think I have just discovered my preference for, in this case, rebasing on top of Thanks again 🙏 |
49b278a
to
a31af62
Compare
Force pushed the rebase. No more forces from now on I agree :) |
Hehe, thanks! I think now you'd be able to make the CHANGELOG.md adjustments I mentioned earlier (just to have your name associated with the commit) and then I'd consider this ready-for-review if you agree :). |
Just to make sure, the top level changelog is for the binaries and the one in the |
Yes, that's correct. In theory, you could also drop the paragraph into |
We run them to assure there are no regressions, while acknowledging that not all succeed right now. The generic test was removed in favor of assuming our code will not actually have a non-unix&windows codepath, making it differeint on unix and windows and nothing else.
Also, add iformation about serialization precision - usually there may be some difference when serializing a parsed URL back into a string, but ideally it leads to the same outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for tackling this!
I am still in the process of reviewing everything, but wanted to share that I really like the new parser implementation. It's much cleaner for sure and great basis to get to a perfectly passing baseline test.
I even learned some tricks like the hidden and hard-to-discover testing
module (which I hope one day we can get rid of), and the parse
module trickery to be able to quickly swap out implementations.
My goal today is to complete the remaining tasks (beware: I may force push during the time) and then merge the PR. And while writing this, I hope you are still motivated drive the implementation to the level of perfection it deserves :).
Tasks
- run baseline tests, put them to work
- baseline tests have been changed so that they are actually running and can work with what's there. Gradual improvements can be done that way, while protecting from regressions.
- re-add all deleted tests in such a way that it's clear whether they should pass, or that they don't pass for good reason.
- All manually added tests where probably added with a specific purpose in mind and that's value to try not to loose.
- assure there is support for non-UTF-8 file-paths, assure there is a test for that.
- Add fuzzing (similar to pathspec) - will be picked up by google oss fuzz as well
66b4cd7
to
ed1a669
Compare
Instead of removing failing tests, make clear why their expectation changed now.
ed1a669
to
4184a5e
Compare
Thanks so much, great work! It was quite nice to see the fuzzer not find anything even after 24m tested inputs, and I don't know if that would have worked with the old implementation ;). I have a feeling that once the remaining TODOs are resolved, most if not all of the baseline tests would work as well. In any case, I will be looking forward to seeing gradual improvements if you are still interested :). |
Thanks again for your patience with this PR, you can definitely expext more in the next weeks :) |
The goal of this PR is to make the implementation of the url parser simpler. I consider this PR done when the test suite (without the baseline tests) succeeds again. Follow up PRs can gradually improve the results of the baseline tests until we can turn it on by default.
The old implementation is not yet removed, because I use it as a reference and that works way better if the file still exists in my local repository. It can be removed before merging this PR.
The first two commits of this PR I wanted to merge together with #965 but forgot to push them :)