Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 8.0 | Proposal for handling "namespaced names as single token" #3041

Closed
jrfnl opened this issue Aug 4, 2020 · 8 comments · Fixed by #3063
Closed

PHP 8.0 | Proposal for handling "namespaced names as single token" #3041

jrfnl opened this issue Aug 4, 2020 · 8 comments · Fixed by #3063
Milestone

Comments

@jrfnl
Copy link
Contributor

jrfnl commented Aug 4, 2020

What changes in PHP 8

From the RFC:

PHP distinguishes four kinds of namespaced names:

  • Unqualified names like Foo, which coincide with identifiers.
  • Qualified names like Foo\Bar.
  • Fully qualified names like \Foo.
  • Namespace-relative names like namespace\Foo.

Each of these kinds will be represented by a distinct token:

Foo;
// Before: T_STRING
// After:  T_STRING
// Rule:   {LABEL}

Foo\Bar;
// Before: T_STRING T_NS_SEPARATOR T_STRING
// After:  T_NAME_QUALIFIED
// Rule:   {LABEL}("\\"{LABEL})+
 
\Foo;
// Before: T_NS_SEPARATOR T_STRING
// After:  T_NAME_FULLY_QUALIFIED
// Rule:   ("\\"{LABEL})+
 
namespace\Foo;
// Before: T_NAMESPACE T_NS_SEPARATOR T_STRING
// After:  T_NAME_RELATIVE
// Rule:   "namespace"("\\"{LABEL})+

Individual namespace segments may contain reserved keywords:

// This is interpreted as T_LIST (i.e., as a reserved keyword):
List
// All of the following are interpreted as legal namespaced names:
\List
FooBar\List
namespace\List

Whitespace (or comments) is not permitted between namespace separators.

Ref: https://wiki.php.net/rfc/namespaced_names_as_token

TL;DR

So, in short, PHP 8.0 introduces 3 new tokens for identifiers, on top of the existing T_STRING:

  • T_NAME_FULLY_QUALIFIED
  • T_NAME_RELATIVE
  • T_NAME_QUALIFIED

Perspective on this as a sniff author

In the long run, I think these new tokens will make writing sniffs analyzing function calls, class instantiation and the likes much easier, especially when namespace/use statement context needs to be taken into account when matching identifiers.

However, this is also a big breaking change which would require a significant change in how any such sniff is analyzing code.

A significant part of the current test failures against the nightly build for PHPCS itself can be traced back to this particular change. Same for several external standards which I examined.

Proposal for how to handle this change in PHPCS

I'd like to propose the following and would like to invite other sniff authors to give their opinion on this as well:

PHPCS 3.x

For PHPCS 3.x, basically "undo" the change, disregarding any (no longer reserved) keywords.

In practice, this would mean splitting the above tokens on the namespace separator \ and tokenizing the name parts as T_STRING (aside from a potential namespace keyword which would need to be tokenized as T_NAMESPACE).

PHPCS 4.x

For PHPCS 4.x, backfill the PHP 8.0 tokenization for the supported PHP versions < 8.0.

Reasoning

  • It moves the breaking change part of this PHP 8.0 change to a major version in PHPCS, which is expected to have breaking changes anyhow.
  • It will allow existing sniffs looking for namespace names, use statements, function calls etc to continue to function as before while on PHPCS 3.x.
  • It moves the decision on the timing of when to rewrite/adjust these sniffs back to the maintainers of standards as it would coincide with when they will start supporting PHPCS 4.x.
  • It moves the decision on whether to concurrently support both types of tokenization to standard maintainers, as it now becomes a decision on whether to concurrently support PHPCS 3.x as well as 4.x in the standard at the same time or to drop support for PHPCS 3.x, or potentially, to have separate branches/versions which work with the different PHPCS versions.

Opinions ?

/cc @gsherwood @photodude @umherirrender @TomasVotruba @sirbrillig @michalbundyra @djoos @louisl @kukulich @carusogabriel @klausi @dereuromark @michalbundyra @jmarcil @weierophinney (and please feel free to ping anyone I missed)

@photodude
Copy link
Contributor

@jrfnl I've kinda dropped out of development activities for the time being. I'm not sure I have much to contribute at the moment. I think the proposal looks good.

@jrfnl
Copy link
Contributor Author

jrfnl commented Aug 31, 2020

PR #3063 addresses this for PHPCS 3.x.

I have also prepared the necessary changes for PHPCS 4.x, but PR #3066 should be reviewed & merged + merged into the PHPCS 4.x branch before I can pull the PHPCS 4.x branch for this change without it causing conflicts/leaving stray tokens behind.

@gsherwood gsherwood added this to the 3.5.7 milestone Aug 31, 2020
jrfnl added a commit to PHPCSStandards/PHPCSUtils that referenced this issue Sep 13, 2020
In PHP 8.0 identifier name tokenization will change as outline in the [accepted RFC "Treat namespaced names as single token"(https://wiki.php.net/rfc/namespaced_names_as_token).

When the PHP 8.0 identifier name tokenization is used, the target token to find for some tests will need to be a different token - for instance: `T_STRING` vs `T_FULLY_QUALIFIED_NAME` -.
Along the same lines, the expected token positions in the return value of various functions will also be different when the PHP < 8.0 tokenization is used as certain tokens will be "squashed" into one token.

This adds a test helper method to allow tests to "know" whether or not to expect the PHP 8.0 identifier name tokenization, so the test setup/expectations can be adjusted based on the expected tokenization.

The method is based on the _current reality_.
At this time the PHP 8 tokenization should be expected on all PHPCS versions when run on PHP 8.

Refs:
* https://wiki.php.net/rfc/namespaced_names_as_token
* [Proposal for handling this PHP 8 change in PHPCS](squizlabs/PHP_CodeSniffer#3041)
* [Open PR for the PHPCS 3.x branch to "undo" the PHP 8 tokenization](squizlabs/PHP_CodeSniffer#3063)
jrfnl added a commit to PHPCSStandards/PHPCSUtils that referenced this issue Sep 13, 2020
In PHP 8.0 identifier name tokenization will change as outline in the [accepted RFC "Treat namespaced names as single token"](https://wiki.php.net/rfc/namespaced_names_as_token).

When the PHP 8.0 identifier name tokenization is used, the target token to find for some tests will need to be a different token - for instance: `T_STRING` vs `T_FULLY_QUALIFIED_NAME` -.
Along the same lines, the expected token positions in the return value of various functions will also be different when the PHP < 8.0 tokenization is used as certain tokens will be "squashed" into one token.

This adds a test helper method to allow tests to "know" whether or not to expect the PHP 8.0 identifier name tokenization, so the test setup/expectations can be adjusted based on the expected tokenization.

The method is based on the _current reality_.
At this time the PHP 8 tokenization should be expected on all PHPCS versions when run on PHP 8.

Refs:
* https://wiki.php.net/rfc/namespaced_names_as_token
* [Proposal for handling this PHP 8 change in PHPCS](squizlabs/PHP_CodeSniffer#3041)
* [Open PR for the PHPCS 3.x branch to "undo" the PHP 8 tokenization](squizlabs/PHP_CodeSniffer#3063)
gsherwood added a commit that referenced this issue Sep 20, 2020
@jrfnl
Copy link
Contributor Author

jrfnl commented Sep 20, 2020

@gsherwood Should this ticket be re-opened for the PHPCS 4.x part ? Only the 3.x part has been fixed so far.

Once the union types PR (#3032) is merged in PHPCS 3.6.0 and cherrypicked to 4.x, I will pull the PR for this change to the 4.x branch. (would conflict otherwise)

@gsherwood
Copy link
Member

I'll move it to the 4.0 project.

@gsherwood gsherwood reopened this Sep 20, 2020
@gsherwood gsherwood removed this from the 3.5.7 milestone Sep 20, 2020
@gsherwood gsherwood modified the milestones: 3.5.7, 4.0.0 Sep 20, 2020
@gsherwood gsherwood removed this from the 3.5.7 milestone Sep 20, 2020
@gsherwood
Copy link
Member

After some mucking around, I would really need this to be 2 tickets to organise things properly. So have just moved it to 4.0 instead of keeping it in both.

jrfnl added a commit to PHPCSStandards/PHPCSUtils that referenced this issue Sep 20, 2020
Upstream PR 3063 which was merged in PHPCS 3.5.7 effectively "undoes" the PHP 8.0 tokenization for identifier names for PHPCS 3.x, while it also includes backfilling the PHP 8.0 token constants.

In PHPCS 4.x the PHP 8.0 tokenization will be backfilled instead.

This commit annotates this change in the `Collections::nameTokens()` method and updates the unit tests for various token collections to handle this correctly as well.

Refs:
* squizlabs/PHP_CodeSniffer#3041
* squizlabs/PHP_CodeSniffer#3063
jrfnl added a commit to PHPCSStandards/PHPCSUtils that referenced this issue Sep 21, 2020
Upstream PR 3063 which was merged in PHPCS 3.5.7 effectively "undoes" the PHP 8.0 tokenization for identifier names for PHPCS 3.x, while it also includes backfilling the PHP 8.0 token constants.

In PHPCS 4.x the PHP 8.0 tokenization will be backfilled instead, but that PR has not yet been pulled.

This commit annotates the current change in the `Collections::nameTokens()` method and updates the unit tests for various token collections to handle this correctly as well.

Refs:
* squizlabs/PHP_CodeSniffer#3041
* squizlabs/PHP_CodeSniffer#3063
@jrfnl
Copy link
Contributor Author

jrfnl commented Nov 3, 2020

PR #3155 has now been pulled to address the changes as proposed for PHPCS 4.x.

@gsherwood gsherwood added this to the 4.0.0 milestone Nov 9, 2020
@gsherwood
Copy link
Member

PR has been merged but leaving In Progress until changelog is written

@gsherwood
Copy link
Member

Forgot to add the changelog entry for this, but have done so now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants