Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex matching being used in PartOfSpeech representation model #2138

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

woranov
Copy link
Contributor

@woranov woranov commented Sep 5, 2024

What does this PR do?

By default, pandas' str.contains interprets the argument as a regex pattern. This causes the PartOfSpeech model to error out if the topics contain e.g. mismatched parentheses. This PR fixes that issue.

Fixes #2153.

Before submitting

  • This PR fixes a typo or improves the docs (if yes, ignore all other checks!).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes (if applicable)?
  • Did you write any new necessary tests?

@MaartenGr
Copy link
Owner

Thank you for the PR. I didn't see an open issue attached which is generally the procedure when opening up a PR. Could you add it?

With respect to the suggested change, do you have any example of what it fixes? I'm not sure I understand in wich specific scenarios the issue would happen.

@woranov
Copy link
Contributor Author

woranov commented Sep 16, 2024

Apologies! Indeed there are some preconditions to be met for the error to arise.

Opened an issue with a reproduction example in #2153.

@MaartenGr
Copy link
Owner

Thanks for the update! This seems good to me, let's merge 😄

@MaartenGr MaartenGr merged commit eba1d34 into MaartenGr:master Sep 17, 2024
@woranov woranov deleted the fix-pos-str-contains branch September 19, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PartOfSpeech errors when candidate topics contain special characters
2 participants