-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError with 1.3.1 and python2.7 #816
Comments
Does this still exist in 1.3.3? Do you have a small test case that can reproduce this? |
I've experienced the same issue. Here's a minimal test case that triggers the problem for me.
So the problem is caused by mixing unicode and 8-bit str output in a test that fails. StringIO code contains this comment: nose==1.3.4 and django-nose==1.2 |
What do you propose the solution to be? What should be captured? How should it be coerced? Yes, mixing non-unicode and unicode is a problem, but what do you think the correct behavior should be and why? |
BTW, thanks for the test case! |
I guess that a test case (especially a failing one) might output anything in stdout. Often such outputs do come from inside 3rd party code, so nose should be able to accept a mix of any outputs I'm not very knowledgeable on this issue, but one (maybe totally unworkable) idea:
io library was added in Python 2.6, if older versions need to be supported the current StringIO solution can be preserved |
None of what you proposed works with Python 2.5 or 2.4. And I'm not real interested in maintaining yet another place where everything differs. Is there a solution that works across the board? |
Problem could be solved by implementing a subclass of StringIO that overrides just getvalue(). Pseudocode something like:
This is not a nice and general solution (as using io.TextIOBase would be), but should be workable in all Python versions. Note: I've only checked Python 2.7 StringIO code, if internal implementation of StringIO is a lot different in earlier Python versions, this will fail. Also don't know about Python 3, possibly this bug does not even occur there. |
Doesn't this approach corrupt the output? |
Yes it does, if you have a list of buffers in who-knows-what encoding and you combine them into one string, that's what happens (but output containing some question marks is preferable to tests not running at all). But there may be another option. The problem is caused by forcing a list of strings to use the same encoding, so don't do that. Skip calling getvalue() altogether, implement another method that goes through the internal StringIO buffer list and prints each buffer in turn. No need to force any encoding on them. Even then, depending on value of sys.stdout.encoding, sometimes the output will be corrupted in any case, but I don't see a way around that. |
On Python 2, `sys.stdout` and `print` can normally handle any combination of `str` and `unicode` objects. However, `StringIO.StringIO` can only safely handle one or the other. If the program writes both a non-ASCII `unicode` string, and a non-ASCII `str` string, then the `getvalue()` method will fail with `UnicodeDecodeError` [1]. In nose, that causes the script to suddenly abort, with the cryptic `UnicodeDecodeError`. This fix catches `UnicodeError` when trying to get the captured output, and will replace the captured output with a warning message. Fixes nose-devs#816 [1] <https://github.com/python/cpython/blob/2.7/Lib/StringIO.py#L258>
On Python 2, `sys.stdout` and `print` can normally handle any combination of `str` and `unicode` objects. However, `StringIO.StringIO` can only safely handle one or the other. If the program writes both a `unicode` string, and a non-ASCII `str` string, then the `getvalue()` method will fail with `UnicodeDecodeError` [1]. In nose, that causes the script to suddenly abort, with the cryptic `UnicodeDecodeError`. This fix catches `UnicodeError` when trying to get the captured output, and will replace the captured output with a warning message. Fixes nose-devs#816 [1] <https://github.com/python/cpython/blob/2.7/Lib/StringIO.py#L258>
On Python 2, `sys.stdout` and `print` can normally handle any combination of `str` and `unicode` objects. However, `StringIO.StringIO` can only safely handle one or the other. If the program writes both a `unicode` string, and a non-ASCII `str` string, then the `getvalue()` method will fail with `UnicodeDecodeError` [1]. In nose, that causes the script to suddenly abort, with the cryptic `UnicodeDecodeError`. This fix catches `UnicodeError` when trying to get the captured output, and will replace the captured output with a warning message. Fixes nose-devs#816 [1] <https://github.com/python/cpython/blob/2.7/Lib/StringIO.py#L258>
When using nosetests on failing tests with output which contains non-ascii characters, I get the following error:
The text was updated successfully, but these errors were encountered: