Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow umlaut domains for website addresses #952

Merged
merged 3 commits into from
Aug 4, 2023

Conversation

schneidr
Copy link
Contributor

@schneidr schneidr commented Apr 18, 2023

Checklist

  • All new and existing tests are passing
  • (If adding features:) I have added tests to cover my changes
  • I have added an entry to CHANGES.rst because this is a user-facing change or an important bugfix
  • I have written proper commit message(s)

What changes does this Pull Request introduce?

Changed website validation to allow domain names containing umlauts

Why is this necessary?

Resolves issue #951

@schneidr schneidr changed the title Umlaut domains Allow umlaut domains for website addresses Apr 18, 2023
@ix5 ix5 added server (Python) server code bug needs-decision Architectural/Behavioral decision by maintainers needed labels Apr 18, 2023
@ix5 ix5 added this to the 0.14 milestone Apr 18, 2023
@ix5
Copy link
Member

ix5 commented Apr 19, 2023

Thank you for this PR. In my eyes, we should avoid introducing new dependencies.

The relevant part of the validators package is domain.py.
The package is MIT licensed and we can copy over that function into isso rather than having to load a whole package for a simple regex check.

Copy link
Member

@jelmer jelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a test for the use case you're trying to address

return __url_re.match(text) is not None
text = normalize(text)
# urlparse does not like port numbers in URLs
text = re.sub(r':\d+', '', text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems odd to me - urlparse handles port numbers in URLs fine, so there must be something else going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, the reason for removing the port was not urlparse, it is validators.domain() which does not accept domain:port. I guess I could clean this up by using hostname instead of netloc, but if the complete URL is supposed to be validated these lines are most probably not staying anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add my test case in test_comments.py.

@jelmer
Copy link
Member

jelmer commented Apr 19, 2023

Do we actually need to check the domain name at all?

@ix5
Copy link
Member

ix5 commented Apr 21, 2023

Do we actually need to check the domain name at all?

Because IIRC the website is inserted as a link (if given), we should make sure it is valid.

If we skipped the validity check, I'm not sure that the markup escaping would catch e.g. someone entering malicious Javascript foo</a><span onclick='evil.function()'></span>.

@jelmer
Copy link
Member

jelmer commented Apr 21, 2023

Do we actually need to check the domain name at all?

Because IIRC the website is inserted as a link (if given), we should make sure it is valid.

If we skipped the validity check, I'm not sure that the markup escaping would catch e.g. someone entering malicious Javascript foo</a><span onclick='evil.function()'></span>.

That seems like an argument for just fixing the markup escaping to me...

@schneidr
Copy link
Contributor Author

Do we actually need to check the domain name at all?

Because IIRC the website is inserted as a link (if given), we should make sure it is valid.

If we skipped the validity check, I'm not sure that the markup escaping would catch e.g. someone entering malicious Javascript foo</a><span onclick='evil.function()'></span>.

So, the goal is to actually check the complete entered URL, not only if the domain is valid, as I assumed?

@schneidr schneidr requested a review from jelmer April 23, 2023 07:06
@jelmer
Copy link
Member

jelmer commented Aug 4, 2023

Sorry for the delay; let's just merge this, since it's clearly an improvement over the current situation. Ideally, we'd be moving away from checking for valid domains to checking for malicious things though.

@jelmer jelmer merged commit 73d9886 into isso-comments:master Aug 4, 2023
@schneidr schneidr deleted the umlaut-domains branch August 4, 2023 13:08
@ix5 ix5 modified the milestones: 0.14, 0.13.1 May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-decision Architectural/Behavioral decision by maintainers needed server (Python) server code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants