Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File encoding specified in "files.encoding" setting is not honored #414

Closed
strrchr opened this issue Dec 19, 2016 · 14 comments
Closed

File encoding specified in "files.encoding" setting is not honored #414

strrchr opened this issue Dec 19, 2016 · 14 comments
Assignees
Labels
Feature Request fixed Check the Milestone for the release in which the fix is or will be available. Language Service world ready An issue relating string character encodings, localization translations, etc.
Milestone

Comments

@strrchr
Copy link

strrchr commented Dec 19, 2016

In my workspace's settings.json
{
"files.encoding": "gbk"
}
default
These is still encoding problem in the auto completetion.

@sean-mcmanus
Copy link
Contributor

I'm not able to repro the problem. From your screenshot, it looks like you may be using another C++ extension for auto-completion, because our extensions doesn't provide test like that. Can you provide a sample that repros this? I created a file with GBK encoding and I didn't see a problem.

@strrchr
Copy link
Author

strrchr commented Dec 20, 2016

  • VSCode Version: Code 1.8.1 (ee428b0eead68bf0fb99ab5fdc4439be227b6281, 2016-12-19T14:49:23.350Z)
  • OS Version: Windows_NT ia32 6.1.7601
  • Extensions:
Extension Author Version
xml DotJoshJohnson 1.6.0
auto-close-tag formulahendry 0.3.6
auto-rename-tag formulahendry 0.0.8
green-theme vscode 0.1.0
beautify HookyQR 0.6.2
theme-material-theme jprestidge 1.0.1
cpptools ms-vscode 0.9.3
vscode-icons robertohuertasm 4.2.0

default

The gbk file should contain chinese characters.
Such as: 中文字符中文字符中文字符

@sean-mcmanus
Copy link
Contributor

Okay, I am able to repro the bug now. Thanks. I don't know how you got the text write(sprint ("%s..."... to appear, but I was able to get a repro via int func2(string s = "中文字符中文字符中文字符") {...

@sean-mcmanus sean-mcmanus added the fixed Check the Milestone for the release in which the fix is or will be available. label Mar 22, 2017
@sean-mcmanus
Copy link
Contributor

This bug was fixed a while ago.

@sean-mcmanus sean-mcmanus removed the fixed Check the Milestone for the release in which the fix is or will be available. label Mar 22, 2017
@sean-mcmanus
Copy link
Contributor

Oops, it's not fixed. Looks like we don't handle GBK encoding correctly.

@sean-mcmanus sean-mcmanus reopened this Mar 22, 2017
@sean-mcmanus
Copy link
Contributor

sean-mcmanus commented Mar 22, 2017

It looks like "most" functionality is broken when it interacts with 2-byte GBK encoding Chinese characters. Our code assumes UTF-8 characters are being used and we never check the encoding.

@bobbrow bobbrow added the world ready An issue relating string character encodings, localization translations, etc. label Apr 24, 2019
@bobbrow bobbrow changed the title Auto completetion should follow files.encoding in workspace settings Support file encodings other than UTF-8 Jul 31, 2019
@sean-mcmanus sean-mcmanus added this to the On Deck milestone Oct 1, 2019
@sean-mcmanus
Copy link
Contributor

I hit this accidentally after opening header file that somehow got encoding "UTF-8 with BOM" -- IntelliSense was broken due to offsets being all off by one.

@Colengms
Copy link
Contributor

Moving this into 1.0 milestone, as support for GB18030 encoding is a global compliance requirement.

@Colengms Colengms modified the milestones: On Deck, 1.0 Mar 18, 2020
@hushunding
Copy link

I also have same problem,when mouse over function same time cannot get right tip。

  1. all file use gbk coding

  2. when function/variable/macro is not defined in same file,the comment is wrong
    image

  3. but is correct if define in same file
    image

  4. it seem langue server can not get the correct coding of files that are not opened.

@bobbrow bobbrow modified the milestones: 0.28.0, 0.29.0 Apr 27, 2020
@Colengms
Copy link
Contributor

Colengms commented Jun 4, 2020

We should be handling UTF8, UTF16LE and UTF16BE properly, as these are detectable based on some header bytes within the file. There appear to be issues with file encodings that aren't detectable this way. GBK, GB2312 and GB18030 just look like UTF8 to us. Files opened within VS Code are provided to us in UTF8, with VS Code having already done the proper conversion from encodings it supports, if configured to do so. It uses 2 settings:

    "files.encoding": "gbk",

or:

    "files.autoGuessEncoding": true

We're not currently considering these settings when opening files directly from disk, such as when scanning header files for doc comments. We should use the encoding specified by files.encoding, if present, whenever we encounter a file without header bytes.

It looks like there are similar issues in VS related to this (we share IntelliSense code with VS) that would need to be addressed as well

@Colengms Colengms changed the title Support file encodings other than UTF-8 File encoding specified in "files.encoding" setting is not honored Jun 4, 2020
@bobbrow bobbrow modified the milestones: 0.29.0, 0.30.0 Jun 4, 2020
@sean-mcmanus sean-mcmanus added the fixed Check the Milestone for the release in which the fix is or will be available. label Aug 31, 2020
@sean-mcmanus
Copy link
Contributor

Fixed with https://github.com/microsoft/vscode-cpptools/releases/tag/0.30.0-insiders4 . Let us know if you see any remaining bugs with non-UTF-8 encodings.

@Colengms
Copy link
Contributor

Colengms commented Sep 2, 2020

Note that the C/C++ Extension will now use the file encoding specified in files.encoding, if there is no (known/supported) BOM detected in the file, and it's not UTF-16 (BE or LE). However, we are not currently replicating the behavior of VS Code's files.autoGuessEncoding, which uses the jschardet library to detect the encoding of a file.

@sean-mcmanus
Copy link
Contributor

@Colengms Should we open an issue to track the potential files.autoGuessEncoding?

@Colengms
Copy link
Contributor

Colengms commented Sep 3, 2020

@sean-mcmanus That would be: #4753

@github-actions github-actions bot locked and limited conversation to collaborators Oct 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature Request fixed Check the Milestone for the release in which the fix is or will be available. Language Service world ready An issue relating string character encodings, localization translations, etc.
Projects
None yet
Development

No branches or pull requests

5 participants