-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some additional ruff checks, ignoring existing violations #56
Conversation
/build-ci |
pyproject.toml
Outdated
line-length = 119 | ||
|
||
[tool.ruff.lint] | ||
ignore = ["C901", "E741", "E501", "D100", "RUF005", "RUF010"] | ||
select = ["C", "E", "F", "I", "W", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't you adding this line 2x, eg "C" appears twice. I prefer the second instance which is on its own line and has a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! whoops, good catch :D. Fixed!
if Path(tar_file).is_file(): | ||
with tarfile.open(tar_file) as tar: | ||
extract_path = f"{str(complete_download_dir)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be careful! This might break things if the dir is a Path
. I think it might repr as a Path('/some/path')
rather than '/some/path'
without the str
. Maybe a compromise would be to create new variables that are explicitly converted to str
earlier on? Also maybe you should change the type on these variables to Union[str, Path]
so that the checker knows that it's not pointless to wrap them in str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: from pathlib import Path
In [2]: path_ex = Path("/some/path")
In [3]: path_ex_str = str(path_ex)
In [4]: print(f"{path_ex}")
/some/path
In [5]: print(f"{path_ex_str}")
/some/path
Actually it seems okay with f strings. I forget where I remember seeing str vs Path issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string use str
, not repr
, so this should be fine. But yeah, this was me simplifying things to avoid RUF010
>>> test = Path('/my/test/path/')
>>> print(f"{test}")
/my/test/path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually like to make a case for having RUF010 ignored.
From the link:
a = "some string"
f"{a!r}"
I think this is weird and significantly less readable as compared to:
a = "some string"
f"{repr(a)}"
With the former, I have to know about new, special syntax. This f-string syntax is quite rare -- I have been reading and writing a lot of Python for the last 6+ years and I have never seen this before. I suspect that most other folks have also never seen this. It would, at the very least then, require someone to stop and go and try and look up what !r
means.
Compare this with repr
or str
. Instead of having to learn new syntax, we instead can combine our pre-existing knowledge about these built-in functions with what we already know about f-strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the explicit nature perspective.
We are not relying on any implicit behavior here -- there's no need to know some very particular details about how f-string interpolation actually works. We have simplicity: we turn something into a str
, then substitute it into another str
.
Taking a look at the link, the justification for this lint rule is that str
is unnecessary:
Note that, in many cases, calling str() within an f-string is unnecessary and can be removed entirely, as the value will be converted to a string automatically,
However, they immediately point out that in fact that "str
is unnecessary" is a rule. There's an exception that they point out:
the notable exception being for classes that implement a custom format method.
My worry now is this -- can we guarantee that this conversion is perfect and bug free?
How is ruff
going to know, with 0 chance of getting it wrong, that, for some non-str
value x
, f"{str(x)}"
is equivalent to f"{x}"
?
I do not think this is possible. So in my mind, this automatic conversion runs the risk of introducing a bug. This risk isn't justified by the benefits of automatically converting code to use this style.
@@ -36,29 +37,30 @@ class Label2IDTokenizer(TokenizerSpec): | |||
|
|||
""" | |||
|
|||
def __init__( | |||
def __init__( # noqa: D107 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a # TODO add docstring and remove #noqa: D107
to a few of these? I had to google what D107 was, and we should really remove these and add documentation. Alternatively, create a ticket to remove all D017 noqa and add docstrings to all functions that are missing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely create a github issue to track. I could do a find/replace to add in that additional TODO; but these noqa
directives were added auto-magically by ruff
return list(text) | ||
|
||
def tokens_to_text(self, tokens): | ||
def tokens_to_text(self, tokens): # noqa: D102 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add tickets for D102 fixing/removal.
@@ -22,7 +22,7 @@ | |||
PrecisionTypes = Literal["fp16", "bf16", "fp32", "bf16-mixed", "fp32-mixed", "16-mixed", "fp16-mixed", 16, 32] | |||
|
|||
|
|||
def get_autocast_dtype(precision: PrecisionTypes) -> torch.dtype: | |||
def get_autocast_dtype(precision: PrecisionTypes) -> torch.dtype: # noqa: D103 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for D103
@@ -23,7 +23,7 @@ | |||
class WandbLoggerOptions(TypedDict): | |||
"""Note: `name` controls the exp name is handled by the NeMoLogger so it is ommitted here. | |||
`directory` is also omitted since it is set by the NeMoLogger. | |||
""" | |||
""" # noqa: D205 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and D205. Basically my ask is that we should have "actually do the fix" tickets/issues for most of these at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ask is you make github issues to fix each style of noqa (not each instance to be clear), with a brief explanation of what that noqa is in the task name. For example for D107 the task name should be like "Add docstrings to all __init__ with missing docstrings"
and in the description it should say "look for all instances of noqa D107 in code, remove, and add docstrings."
Same for the other high level noqas you added.
Will do! I'll wait to make sure this merges before creating them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
line-length = 119 | ||
|
||
[tool.ruff.lint] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/build-ci |
@@ -222,10 +222,10 @@ def download_artifacts( | |||
ngc_dirname = complete_download_dir / ngc_dirname | |||
|
|||
# TODO: this assumes that it's a model for now. | |||
command = f"mkdir -p {str(complete_download_dir)} && {ngc_call_command} registry model download-version {artifact_source_path} --dest {str(complete_download_dir)} && mv {str(ngc_dirname)}/* {str(complete_download_dir)}/ && rm -d {str(ngc_dirname)}" | |||
command = f"mkdir -p {complete_download_dir} && {ngc_call_command} registry model download-version {artifact_source_path} --dest {complete_download_dir} && mv {ngc_dirname}/* {complete_download_dir}/ && rm -d {ngc_dirname}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, this is kinda bad. I assume that there's a pylint rule that is removing the explicit str
conversion here? F string interpolation does call __str__
on all values, which is what str
does. But I don't understand why one would want to make something that's explicit implicit? 🤔
I suppose we're getting away from the Zen of Python now!
https://peps.python.org/pep-0020/
|
||
root_directory: Optional[str] = RemoteResource.get_env_tmpdir() | ||
root_directory: Optional[str] = RemoteResource.get_env_tmpdir() # noqa: RUF009 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow! RUF009
is actually excellent here -- this is an example of something that is actually a bug that should be fixed! Each time we construct a new ResourcePreprocessor
-- or any child class of it -- we are going to be re-using the same single, globally defined value returned by this function.
Instead of adding this ignore comment, can you instead change it to use a field
w/ this function as the factory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_ids: Tensor, | ||
position_ids: Tensor, | ||
tokentype_ids: int | None = None, | ||
attention_mask: Tensor | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually I fixed these manually. There was a complaint about not implicitly setting optional args through int = None
if Path(tar_file).is_file(): | ||
with tarfile.open(tar_file) as tar: | ||
extract_path = f"{str(complete_download_dir)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually like to make a case for having RUF010 ignored.
From the link:
a = "some string"
f"{a!r}"
I think this is weird and significantly less readable as compared to:
a = "some string"
f"{repr(a)}"
With the former, I have to know about new, special syntax. This f-string syntax is quite rare -- I have been reading and writing a lot of Python for the last 6+ years and I have never seen this before. I suspect that most other folks have also never seen this. It would, at the very least then, require someone to stop and go and try and look up what !r
means.
Compare this with repr
or str
. Instead of having to learn new syntax, we instead can combine our pre-existing knowledge about these built-in functions with what we already know about f-strings.
if Path(tar_file).is_file(): | ||
with tarfile.open(tar_file) as tar: | ||
extract_path = f"{str(complete_download_dir)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the explicit nature perspective.
We are not relying on any implicit behavior here -- there's no need to know some very particular details about how f-string interpolation actually works. We have simplicity: we turn something into a str
, then substitute it into another str
.
Taking a look at the link, the justification for this lint rule is that str
is unnecessary:
Note that, in many cases, calling str() within an f-string is unnecessary and can be removed entirely, as the value will be converted to a string automatically,
However, they immediately point out that in fact that "str
is unnecessary" is a rule. There's an exception that they point out:
the notable exception being for classes that implement a custom format method.
My worry now is this -- can we guarantee that this conversion is perfect and bug free?
How is ruff
going to know, with 0 chance of getting it wrong, that, for some non-str
value x
, f"{str(x)}"
is equivalent to f"{x}"
?
I do not think this is possible. So in my mind, this automatic conversion runs the risk of introducing a bug. This risk isn't justified by the benefits of automatically converting code to use this style.
Apologies for missing the comments before it auto-merged -- but we discussed this via slack and it should be fine: |
No worries! Thanks for the discussion. My bad for not reading thoroughly enough what |
As we move towards automatically generated API documentation, it would be helpful if we had pydocstyle linting as part of our ruff checks. This PR refactors the pyproject.toml file slightly, and runs
--add-noqa
to explicitly ignore exisiting violations.https://docs.astral.sh/ruff/tutorial/#adding-rules
We can come back and fix these as we go