-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add opaque hosts #185
Add opaque hosts #185
Conversation
@achristensen07 what do you think of this approach? |
Seems good. Make sure to add a test verifying something like "new URL('path', 'asdf://host')" serializes to 'asdf://host/path' adding the slash. I broke something by forgetting that in my initial implementation. I don't add a slash for query-only or fragment-only relative URLs, but that might be worth more discussion. |
@achristensen07 running https://quuz.org/url/liveview.html#notspecial://test?x in Safari TP does yield a slash for me. |
Yes, but "new URL('?query', 'asdf://host')" does not add a slash right now in STP. Maybe this is inconsistent. I think this is a good approach, but we might need another issue for these query-only and fragment-only URLs with non-special schemes. |
for further processing. | ||
|
||
<p class="note no-backref">A <a>non-special-URL host</a> is emphatically not a <a for=/>host</a> and | ||
only used by <a lt="is special">non-special URLs</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is only used by...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"X is A and B" seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you add "not"s it's confusing. "X is not A and B" could be either "X is not A and X is not B" or "X is not A and X is B".
<a for=url>path</a> (including empty strings), separated from each other by | ||
"<code>/</code>", to <var>output</var>. | ||
<li> | ||
<p>Otherwise, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove ", then" here
<li><p>Return the result of <a lt="host parser">host parsing</a> <var>input</var>. | ||
</ol> | ||
|
||
<li><p>Return <var>input</var>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I either need to percent-encode input here or in the state machine.
The pathname getter should probably also be adjusted in a way similar to the serializer. |
URL Standard change: whatwg/url#185.
Yeah I think that would be quite odd and required a slash in the tests (and standard) for now. I guess I should also test |
<a lt="host serializer">serialized</a>, to <var>output</var>. | ||
|
||
<li><p>Otherwise, append <var>url</var>'s <a for=url>host</a>, to <var>output</var>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/, to output/ to output/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's currently somewhat consistently "wrong"...
This doesn't seem to work either, testing in whatwg-url. In particular |
Yeah, you are right. Even with EOF you get a path that is non-empty. Another problem here is not forbidding certain code points in the host setter. I guess I should rethink this. The other question of course is whether all this additional complexity is worth it, versus treating non-special schemes as just having a path and not being able to be base URLs for each other. (Which is what most browsers implement at the moment.) |
One thought I had is that it's perhaps worth splitting the non-special-URL host change and the change to not serialize a path if none was in the input. But it might be easy to do the no path in input change. It only really matters if the URL is not special and you hit EOF during the host/hostname state, right? It should be easy enough to account for that. |
I'm going to pursue making this work, since it seems worthwhile to support relative URLs for non-special schemes, even if it comes at a cost somewhat greater than initially anticipated. I don't think this is the reason to move to the Chrome/Firefox model of effectively setting the cannot-be-base-URL flag for all non-special URLs. I landed a commit that addresses the issue @domenic found, but I also need to look through the tests again since I think I missed updating a couple tests (in particular setter tests). |
URL Standard change: whatwg/url#185.
I also updated the tests. I didn't add any new host/hostname setter tests however. |
I forgot to address "Another problem here is not forbidding certain code points in the host setter" so this is still not ready. |
The other thing I noticed that while testing is that currently we allow userinfo and a port to exist while host is empty. That shouldn't be allowed (and isn't in Safari). |
Also, come to think of it, it's basically the same code points that are restricted. I just removed the code points that are not problematic for non-special URLs, such as [ and ] (restricted for IPv6). |
We could keep the list of restricted code points the same though if you think that is better. I don't care strongly. |
"https://" is now invalid in WebKit. I don't think it's worth allowing it in the spec. It's a relatively small compatibility issue. |
@domenic this PR and the corresponding tests at web-platform-tests/wpt#4406 can be reviewed again. I have verified it with whatwg-url, but haven't updated my patch for it. Since it's a rather trivial change it seems good to use that as a test to ensure you get the same results. |
idempotent: | ||
https://@/example.org/ -> https:///example.org/ -> https://example.org/ --> | ||
|
||
<li><p>Decrease <var>pointer</var> by the number of code points in <var>buffer</var> plus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be separate substeps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this as-is since the parser generally groups steps that can reasonably be grouped.
URL Standard change: whatwg/url#185.
- Update to spec - Add opaque hosts - File state did not correctly deal with lack of base URL - Cleanup API for file and non-special URLs - Allow % and IPv6 addresses in non-special URL hosts - Use specific names for percent-encode sets - Add empty host concept for file and non-special URLs - Clarify IPv6 serializer - Fix existing mistakes - Add missing ':' to forbidden host code point list. - Correct IPv4 parser empty label behavior - Maintain type equivalence in URLContext with spec - scheme, username, and password should always be strings - host, port, query, and fragment may be strings or null - Align scheme state more closely with the spec - Make sure the `special` variable is always synced with URL_FLAG_SPECIAL. PR-URL: nodejs#12523 Fixes: nodejs#10608 Fixes: nodejs#10634 Refs: whatwg/url#185 Refs: whatwg/url#225 Refs: whatwg/url#224 Refs: whatwg/url#218 Refs: whatwg/url#243 Refs: whatwg/url#260 Refs: whatwg/url#268 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
- Update to spec - Add opaque hosts - File state did not correctly deal with lack of base URL - Cleanup API for file and non-special URLs - Allow % and IPv6 addresses in non-special URL hosts - Use specific names for percent-encode sets - Add empty host concept for file and non-special URLs - Clarify IPv6 serializer - Fix existing mistakes - Add missing ':' to forbidden host code point list. - Correct IPv4 parser empty label behavior - Maintain type equivalence in URLContext with spec - scheme, username, and password should always be strings - host, port, query, and fragment may be strings or null - Align scheme state more closely with the spec - Make sure the `special` variable is always synced with URL_FLAG_SPECIAL. PR-URL: #12523 Fixes: #10608 Fixes: #10634 Refs: whatwg/url#185 Refs: whatwg/url#225 Refs: whatwg/url#224 Refs: whatwg/url#218 Refs: whatwg/url#243 Refs: whatwg/url#260 Refs: whatwg/url#268 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
- Update to spec - Add opaque hosts - File state did not correctly deal with lack of base URL - Cleanup API for file and non-special URLs - Allow % and IPv6 addresses in non-special URL hosts - Use specific names for percent-encode sets - Add empty host concept for file and non-special URLs - Clarify IPv6 serializer - Fix existing mistakes - Add missing ':' to forbidden host code point list. - Correct IPv4 parser empty label behavior - Maintain type equivalence in URLContext with spec - scheme, username, and password should always be strings - host, port, query, and fragment may be strings or null - Align scheme state more closely with the spec - Make sure the `special` variable is always synced with URL_FLAG_SPECIAL. PR-URL: #12523 Fixes: #10608 Fixes: #10634 Refs: whatwg/url#185 Refs: whatwg/url#225 Refs: whatwg/url#224 Refs: whatwg/url#218 Refs: whatwg/url#243 Refs: whatwg/url#260 Refs: whatwg/url#268 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
- Update to spec - Add opaque hosts - File state did not correctly deal with lack of base URL - Cleanup API for file and non-special URLs - Allow % and IPv6 addresses in non-special URL hosts - Use specific names for percent-encode sets - Add empty host concept for file and non-special URLs - Clarify IPv6 serializer - Fix existing mistakes - Add missing ':' to forbidden host code point list. - Correct IPv4 parser empty label behavior - Maintain type equivalence in URLContext with spec - scheme, username, and password should always be strings - host, port, query, and fragment may be strings or null - Align scheme state more closely with the spec - Make sure the `special` variable is always synced with URL_FLAG_SPECIAL. PR-URL: #12523 Fixes: #10608 Fixes: #10634 Refs: whatwg/url#185 Refs: whatwg/url#225 Refs: whatwg/url#224 Refs: whatwg/url#218 Refs: whatwg/url#243 Refs: whatwg/url#260 Refs: whatwg/url#268 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
- Update to spec - Add opaque hosts - File state did not correctly deal with lack of base URL - Cleanup API for file and non-special URLs - Allow % and IPv6 addresses in non-special URL hosts - Use specific names for percent-encode sets - Add empty host concept for file and non-special URLs - Clarify IPv6 serializer - Fix existing mistakes - Add missing ':' to forbidden host code point list. - Correct IPv4 parser empty label behavior - Maintain type equivalence in URLContext with spec - scheme, username, and password should always be strings - host, port, query, and fragment may be strings or null - Align scheme state more closely with the spec - Make sure the `special` variable is always synced with URL_FLAG_SPECIAL. PR-URL: #12523 Fixes: #10608 Fixes: #10634 Refs: whatwg/url#185 Refs: whatwg/url#225 Refs: whatwg/url#224 Refs: whatwg/url#218 Refs: whatwg/url#243 Refs: whatwg/url#260 Refs: whatwg/url#268 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
For URLs without a special scheme we cannot use the host parser
directly due to compatibility issues and we need to serialize slightly
differently too.
Fixes #148.