Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add RFC 3986 and WHATWG compliant URL parsing support #14461

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

kocsismate
Copy link
Member

@kocsismate kocsismate commented Jun 3, 2024

2nd take after the failed experiment with #11315

RFC: https://wiki.php.net/rfc/url_parsing_api

@@ -3715,7 +3715,6 @@ function uniqid(string $prefix = "", bool $more_entropy = false): string {}

/**
* @return int|string|array<string, int|string>|null|false
* @compile-time-eval
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed for benchmarking purposes

ext/url/php_url.c Outdated Show resolved Hide resolved

static void cleanup_parser(void)
{
if (++URL_G(urls) % 500 == 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach is copy-pasted from lexbor/lexbor#206

@kocsismate kocsismate force-pushed the ext-url2 branch 5 times, most recently from 823e4c6 to e65deb7 Compare June 6, 2024 08:23
@kocsismate kocsismate changed the title Add ext/url based on Lexbor [RFC] Add ext/url based on Lexbor Jun 12, 2024
ext/url/php_url.c Outdated Show resolved Hide resolved
ext/url/config.m4 Outdated Show resolved Hide resolved
@nielsdos
Copy link
Member

nielsdos commented Oct 7, 2024

I need to pull in a new version of Lexbor as well in 8.4 for the GB18030 changes so I'll handle the encoding changes.

@kocsismate
Copy link
Member Author

@nielsdos Ah thanks for the link! I totally missed the readme :/ TBH I silenced the compiler errors with -Wno-int-conversion.

I was about to report your findings back to my Lexbor issue, but I've just seen that you beat me to do it :)

@nielsdos
Copy link
Member

nielsdos commented Oct 7, 2024

TBH I silenced the compiler errors with -Wno-int-conversion.

I'll make a PR to update Lexbor now, then once that's merged you can rebase on master and get rid of the warning silencer.

@kocsismate kocsismate force-pushed the ext-url2 branch 3 times, most recently from acb1a98 to 5a25f1a Compare October 12, 2024 23:18
@kocsismate kocsismate force-pushed the ext-url2 branch 4 times, most recently from 13e8566 to 8e07430 Compare October 26, 2024 15:35
@kocsismate kocsismate force-pushed the ext-url2 branch 3 times, most recently from d894dad to 8e21e67 Compare November 18, 2024 21:18
RETURN_THROWS();
}

zend_result result = this_internal_uri->handler->normalize_uri(this_uri);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nielsdos I would like to ask your opinion about a problem I recently realized.

Context: I added a normalize_uri handler that is able to normalize the URI. It is used in multiple places (e.g. normalize(), toNormalizedString, equalsTo() methods). However, this would possibly cause some unexpected behavior when a built-in URI is extended in userland: if a child class overrides the normalize() method, they would legitimately assume that doing so will affect the rest of the relevant methods (toNormalizedString(), equalsTo()), but no, this won't be the case, since the internal handler is reused in fact... which PHP programmers have no control over. I can see two solutions so far:

  • make the built-in URI implementations final: there's no more problem with method overrides, but using composition instead of inheritance requires more work (implementing all the methods of the UriInterface), you can't just substitute an internal URI class with your own implementation etc.
  • the normalize() method itself should be called instead of the normalize_uri internal handler: this solves the issue, but it results in quite some performance overhead

Do you have any preference/insights about this question?

Copy link
Member

@nielsdos nielsdos Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kocsismate Interesting.

Here's an idea:
We can know upfront if the "normalize" method is overridden when we create the internal uri object. We can do this by checking whether the entry for "normalize" in the function table has type ZEND_USER_FUNCTION instead of ZEND_INTERNAL_FUNCTION. In that case, we know we should call the user handler, otherwise we should call the internal handler.

Create a helper function invoke_normalize_uri (name is just illustrative) that you use to invoke either the internal normalize handler or the user function. Then you can use invoke_normalize_uri everywhere without worrying. To make the invocation fast you can have a zend_function* field in the internal object that is NULL when the user did not override it, and points to the user function if the user did override it. Then you can use the Zend APIs to call that user function if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nielsdos Thanks for the brilliant idea! :) That's really impressive. I'll have to think about in which cases a similar issue can happen, but if it's not too frequent, then your solution will perfectly mitigate it. I wouldn't like though if i.e. all the property reader functions would have to use this workaround, but hopefully it won't be the case.

@kocsismate kocsismate force-pushed the ext-url2 branch 3 times, most recently from 18e80e0 to d0b9a3d Compare January 20, 2025 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants