Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix user creation for users with accented characters in their usernames #259

Open
gsidebo opened this issue Oct 15, 2021 · 4 comments · May be fixed by #627
Open

Fix user creation for users with accented characters in their usernames #259

gsidebo opened this issue Oct 15, 2021 · 4 comments · May be fixed by #627
Assignees
Labels
bug Something isn't working

Comments

@gsidebo
Copy link
Contributor

gsidebo commented Oct 15, 2021

Steps to Reproduce

  1. Create a user with some vowels in the username
  2. Create another user with the same username, but change some/all of the characters to their accented equivalents (e.g.: online-user and önlínê-user)

Expected Behavior

The users exist in both mitxonline and Open edX

Actual Behavior

For unexplained reasons, edX treats both of those usernames as the same, so only the more recent user will have a User record on the Open edX side, and one or both users will experience some of the auth/enrollment issues that we saw before we patched up username creation and validation

Related Issues

#202
#244

Implementation

  1. Add a new field to the mixonline User model (normalized_username or something like that)
    • Should be indexed and unique.
    • Should have a data migration that fills in this field for all existing usernames. The value should be the current username with accent characters 'normalized' to unaccented versions.
  2. When validating a new user, the username should be 'normalized' to an unaccented version and checked against this new field. If the same normalized username already exists, the user should see the same validation message that they would see if another user existed with the exact same username value
    • Example: user with username üser1 already exists, then someone else tries to create a user with username user1. That should trigger a validation error, and they should see the message Username 'user1' already exists or whatever the validation message is currently for an exact-match username collision. It's not necessary to tell them about the accented character constraints – we should be fine with just telling them to try a different username
@gsidebo gsidebo added the bug Something isn't working label Oct 15, 2021
@gsidebo gsidebo changed the title Normalize accented characters when validating username uniqueness Fix user creation for users with accented characters in their usernames Oct 15, 2021
@asadiqbal08
Copy link
Contributor

asadiqbal08 commented Oct 18, 2021

@gsidebo so do you think we restrict user creation for users with accented characters in their usernames here ? in terms of some restrict to use specific characters in username ?

@asadiqbal08 asadiqbal08 self-assigned this Oct 18, 2021
@pdpinch
Copy link
Member

pdpinch commented Oct 18, 2021

from Slack

The default collation used for edx-platform is utf8_general_ci (I think). MySQL doesn't support accent-sensitive search alone for older versions (version 8 does). So, the default _ci will include case insensitive and accent insensitive

@gsidebo
Copy link
Contributor Author

gsidebo commented Oct 19, 2021

@asadiqbal08 The problem is that we already have users with accented characters in their usernames. Even if we restrict it on our end from now on, an un-accented new username might still collide with an existing accented username. The "simplest" fix for this would be to restrict the usernames on our end as your suggested, then right after that merges, mass-update all existing usernames in edX to remove accented characters via script. This might require us to notify users. We'd have to ask leadership and customer service about what our policy should be. Another option would be to check that the MySQL issue described above is the actual source of the problem, then look into the possibility of reconfiguring the database.

@briangrossman
Copy link

Note that when we were looking into some other issues with usernames, we discovered a few users who have a username conflict due to the fact that there are some usernames that differ only by accents. They can be found in the Accent Issues section of this document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants