[pydocstyle] docstrings encoding #3436
-
I am currently working on PR #3408. In this PR, I aim to make It turns out that to properly implement some For example, we get no warning from # pydocstyle 1.py --select D205
def no_problem():
"""Hello World.
\nNo Problem.
"""
pub fn unescaped_docstring_char(chars: &mut Chars<'_>) -> Option<UnescapedDocStringChar> {
let c = chars.next()?;
// ...
let res = match c {
'\\' => {
// must have at least one character after it
// otherwise, it will be rejected by the parser
let res = match chars.next().unwrap() {
'\n' => None,
'\\' => Some('\\'),
// ...
'u' => {
// problems!!!
}
c => Some(c),
};
res
}
_ => Some(c),
};
// ...
} As you can see, there is a problem with the
In Python, this character can be a surrogate because "strings are immutable sequences of Unicode code points". However, in Rust, a char is a ‘Unicode scalar value’, which is any ‘Unicode code point’ other than a surrogate code point, which means that we have no way to store a surrogate in Rust by using This leaves us with the question of how to properly unescape such characters, store them in a string and mimic the behaviour of # pydocstyle test.py --ignore D100
def f():
"""Hello World\uDE01.""" I currently have several solutions in mind:
Any thoughts on this? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
If I recall correctly... we used to use the evaluated string body (i.e., the It might be the case that |
Beta Was this translation helpful? Give feedback.
-
@charliermarsh A quick follow-up on this question. If we rely on the evaluated string from the parser, do you have any idea how we can implement auto fix. I don't think we can get information about the position of the characters in the evaluated string. |
Beta Was this translation helpful? Give feedback.
If I recall correctly... we used to use the evaluated string body (i.e., the
s
inExpr::Constant { kind: Constant::Str(s), .. }
), which would probably give you the behavior that you're seeingpydocstyle
, since that gets evaluated by the parser, and so (e.g.) continuations wouldn't be included as part ofs
. But I thought this led to otherpydocstyle
deviations, and so we moved to using the raw string. I can't exactly remember the details unfortunately. We could look through the changelog...It might be the case that
pydocstyle
uses slightly different representations for different rules. E.g., if you look at the source, they sometimes dolines = ast.literal_eval(docstring).strip().split('\n')
…