Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError exception occurs when .env file contains Non-ASCII characters on Windows #300

Closed
BackMountainDevil opened this issue Jan 25, 2021 · 10 comments · Fixed by #306

Comments

@BackMountainDevil
Copy link

Similar to Issue #175 UnicodeDecodeError exception occurs when .env file contains Non-ASCII characters on Windows.
And that issue said it sloved in PR 161 30 Apr 2019, but it did happen in the version 0.15.0 which released on 29 Oct 2020.
Similar situation I met before about decode on windows

Info about env

OS like

  • windows 10 amd64
  • python 3.9.0

pip package info

Package       Version
------------- -------
click         7.1.2
Flask         1.1.2
itsdangerous  1.1.0
Jinja2        2.11.2
MarkupSafe    1.1.1
pip           20.3.3
python-dotenv 0.15.0
setuptools    51.3.3
watchdog      1.0.2
Werkzeug      1.0.1
wheel         0.36.2

what situation will error

code

see rpo

point

When I add a little Chinese comment in file - .flaskenv, it will happen. If delete all Chinese comment, it work fine.

bad result

add a comment to the end of .flaskenv. Just like this

# -*- encoding: utf-8 -*-
# Public env Variables about flask

# FLASK_APP = app.py
# FLASK_RUN_HOST = 127.0.0.1
FLASK_RUN_PORT = 80

# dev mode 
FLASK_ENV = development
# 今天也是个小可爱耶

Then run flask

PS D:\Documents\CAU\Lion\repositiries\Python\fb> flask run
Traceback (most recent call last):
  File "d:\program files\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\program files\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\flask\cli.py", line 967, in main
    cli.main(args=sys.argv[1:], prog_name="python -m flask" if as_module else None)
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\flask\cli.py", line 575, in main
    load_dotenv()
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\flask\cli.py", line 649, in load_dotenv
    dotenv.load_dotenv(path)
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\main.py", line 319, in load_dotenv
    return DotEnv(f, verbose=verbose, interpolate=interpolate, **kwargs).set_as_environment_variables(override=override)
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\main.py", line 106, in set_as_environment_variables
    for k, v in self.dict().items():
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\main.py", line 87, in dict
    values = resolve_nested_variables(self.parse())
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\main.py", line 239, in resolve_nested_variables
    for (k, v) in values:
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\main.py", line 97, in parse
    for mapping in with_warn_for_invalid_lines(parse_stream(stream)):
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\main.py", line 48, in with_warn_for_invalid_lines
    for mapping in mappings:
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\parser.py", line 229, in parse_stream
    reader = Reader(stream)
  File "c:\users\kearney\.virtualenvs\fb-xzc3iotr\lib\site-packages\dotenv\parser.py", line 107, in __init__
    self.string = stream.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xaa in position 194: illegal multibyte sequence

good reslut

PS D:\Documents\CAU\Lion\repositiries\Python\fb> flask run
 * Environment: development
 * Debug mode: on
 * Restarting with windowsapi reloader
 * Debugger is active!
 * Debugger PIN: 139-670-554
 * Running on http://127.0.0.1:80/ (Press CTRL+C to quit)
@greyli
Copy link
Contributor

greyli commented Jan 25, 2021

@theskumar Maybe we could just set the default encoding to UTF-8 (I would like to make a PR for this)?

@BackMountainDevil
Copy link
Author

BackMountainDevil commented Jan 25, 2021 via email

@jobveldhuis
Copy link

I think it would be best to try and detect encoding with a library that was built around detecting encoding with speed and if that fails, default the encoding to UTF-8.

@BackMountainDevil
Copy link
Author

Exactly, that is that the guy do in the Similar situation. Set UTF8 as default, and add a function to dect the encode

@Mystic-Mirage
Copy link

This happens to me when I'm using set_key and existing .env already has utf8-encoded data

dotenv_values accepts encoding argument so set_key also should.

@BackMountainDevil
Copy link
Author

By the way, I turn to Arch Linux. The Chinese comment work fine. Just byebye windows

@andodet
Copy link

andodet commented Mar 9, 2021

Having the same problem on a win10 machine (due to a cyrillic character in a comment). Approach suggested by @BackMountainDevil (using chardet) works like a charm - can submit a PR if that hasn't been resolved yet.

@bbc2
Copy link
Collaborator

bbc2 commented Mar 10, 2021

Hi everyone and thanks for the feedback. This looks like a bug in Flask (for not using encoding="utf-8") but I agree that python-dotenv should be improved to avoid this kind of bug in all applications.

I think that changing the default encoding would be a good idea. It would be a breaking change but the current behavior (defaulting to system's encoding) doesn't make sense to me, so we could probably do it. Furthermore, UTF-8 is the default encoding for Python source code, so using that encoding for .env files by default would certainly be expected.

About that change: it would be breaking Flask since Flask uses load_dotenv without the encoding parameter. I'll check with them to see if they'd like to use the encoding parameter and ask if it would make sense to them for Python-dotenv to use UTF-8 by default.

Detecting the encoding automatically would probably be too brittle (you can't detect all encodings with 100% probability) and bring more surprises to users so I'd rather avoid doing that. Encoding detection can be very useful when you don't know the encoding of documents but I think that you should know the encoding of your project files.

@andodet
Copy link

andodet commented Mar 10, 2021

@bbc2 Thanks a lot for looking into this. Maybe this is a stupid question (it's early morning after all) but wouldn't be enough to change the default encoding of the DotEnv class in order to avoid breaking changes? Again, I am probably missing something here.

@bbc2
Copy link
Collaborator

bbc2 commented Mar 10, 2021

I think I see what you mean. The change wouldn't break the code directly (you would still call functions like load_dotenv the same way, without any argument error. It would break for users who have a non-UTF-8 environment and non-UTF-8 characters in their .env file. I think it could even break silently in some cases, for instance if a non-UTF-8 encoding of a string is understood by the UTF-8 encoder as a different string, but that should be super rare.

It may well affect a very small proportion of users, but we have a lot of them so I'm trying to be careful.

Perhaps the main drawback of such a change, except for it being a breaking change, is the fact that it would make python-dotenv diverge from Python's behavior with regards to file encoding: https://docs.python.org/3/library/os.html#file-names-command-line-arguments-and-environment-variables. In Python, open(filename) uses the system's default file encoding by default (i.e. with encoding=None).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants