Use UTF-8 encoding for log file. #10072

methane · 2021-06-16T03:23:38Z

uranusjr · 2021-06-16T03:34:19Z

Is there a way to only do this to new log files? If there is an existing file written not using UTF-8, this will I think pip will crash with this patch since it won’t be able to open the existing file for appending.

DiddiLeija · 2021-06-16T14:03:33Z

Is there a way to only do this to new log files? If there is an existing file written not using UTF-8, this will I think pip will crash with this patch since it won’t be able to open the existing file for appending.

Agree. There are many ways to make this crash. But if you just write the new files with UTF-8, but giving the option to dismiss that and read another kind of encoding, that's fine for me.

Example: On logging.py, if you use a try/except to attempt the UTF-8 encoding, but giving the option to ignore (and use a warning to tell the user, if you want to) could help.

methane · 2021-06-17T01:58:03Z

Is there a way to only do this to new log files?

It is very difficult. We need to check the encoding of existing file.
But note that the default log file is /dev/null. It is very rare.

If there is an existing file written not using UTF-8, this will I think pip will crash with this patch since it won’t be able to open the existing file for appending.

pip won't crash.

>>> with open("x.txt", "w", encoding="cp932") as f:
...     f.write("こんにちは\n")
...
6
>>> with open("x.txt", "a", encoding="utf-8") as f:
...     f.write("こんにちは\n")
...
6
>>> with open("x.txt", "rb") as f:
...     print(f.read())
...
b'\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd\n\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf\n'
>>>

uranusjr · 2021-06-17T12:38:17Z

pip won't crash.

Interesting! I did not know this. So encoding errors only happen lazily when something actually needs to be encoded/decoded (i.e. on read()). TIL.

DiddiLeija · 2021-06-17T13:05:44Z

Interesting! I did not know this.

Me neither. But after you explain it, I really like the idea.

pradyunsg · 2021-07-04T09:06:29Z

Thanks @methane! ^>^

methane added 2 commits June 16, 2021 11:17

Use UTF-8 for log file

9e220c6

Add NEWS fragment

f6a63eb

uranusjr approved these changes Jun 17, 2021

View reviewed changes

DiddiLeija approved these changes Jun 17, 2021

View reviewed changes

pradyunsg merged commit 156f71b into pypa:main Jul 4, 2021

github-actions bot locked as resolved and limited conversation to collaborators Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use UTF-8 encoding for log file. #10072

Use UTF-8 encoding for log file. #10072

methane commented Jun 16, 2021

uranusjr commented Jun 16, 2021

DiddiLeija commented Jun 16, 2021

methane commented Jun 17, 2021

uranusjr commented Jun 17, 2021

DiddiLeija commented Jun 17, 2021

pradyunsg commented Jul 4, 2021

Use UTF-8 encoding for log file. #10072

Use UTF-8 encoding for log file. #10072

Conversation

methane commented Jun 16, 2021

uranusjr commented Jun 16, 2021

DiddiLeija commented Jun 16, 2021

methane commented Jun 17, 2021

uranusjr commented Jun 17, 2021

DiddiLeija commented Jun 17, 2021

pradyunsg commented Jul 4, 2021