Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some additional ruff checks, ignoring existing violations #56

Merged
merged 5 commits into from
Jul 31, 2024

Conversation

pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Jul 31, 2024

As we move towards automatically generated API documentation, it would be helpful if we had pydocstyle linting as part of our ruff checks. This PR refactors the pyproject.toml file slightly, and runs --add-noqa to explicitly ignore exisiting violations. https://docs.astral.sh/ruff/tutorial/#adding-rules

We can come back and fix these as we go

@pstjohn pstjohn requested review from jomitchellnv and jstjohn July 31, 2024 19:54
@pstjohn
Copy link
Collaborator Author

pstjohn commented Jul 31, 2024

/build-ci

@pstjohn pstjohn self-assigned this Jul 31, 2024
pyproject.toml Outdated
line-length = 119

[tool.ruff.lint]
ignore = ["C901", "E741", "E501", "D100", "RUF005", "RUF010"]
select = ["C", "E", "F", "I", "W",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't you adding this line 2x, eg "C" appears twice. I prefer the second instance which is on its own line and has a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup! whoops, good catch :D. Fixed!

if Path(tar_file).is_file():
with tarfile.open(tar_file) as tar:
extract_path = f"{str(complete_download_dir)}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be careful! This might break things if the dir is a Path. I think it might repr as a Path('/some/path') rather than '/some/path' without the str. Maybe a compromise would be to create new variables that are explicitly converted to str earlier on? Also maybe you should change the type on these variables to Union[str, Path] so that the checker knows that it's not pointless to wrap them in str?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: from pathlib import Path

In [2]: path_ex = Path("/some/path")

In [3]: path_ex_str = str(path_ex)

In [4]: print(f"{path_ex}")
/some/path

In [5]: print(f"{path_ex_str}")
/some/path

Actually it seems okay with f strings. I forget where I remember seeing str vs Path issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-string use str, not repr, so this should be fine. But yeah, this was me simplifying things to avoid RUF010

>>> test = Path('/my/test/path/')
>>> print(f"{test}")
/my/test/path

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually like to make a case for having RUF010 ignored.

From the link:

a = "some string"
f"{a!r}"

I think this is weird and significantly less readable as compared to:

a = "some string"
f"{repr(a)}"

With the former, I have to know about new, special syntax. This f-string syntax is quite rare -- I have been reading and writing a lot of Python for the last 6+ years and I have never seen this before. I suspect that most other folks have also never seen this. It would, at the very least then, require someone to stop and go and try and look up what !r means.

Compare this with repr or str. Instead of having to learn new syntax, we instead can combine our pre-existing knowledge about these built-in functions with what we already know about f-strings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the explicit nature perspective.

We are not relying on any implicit behavior here -- there's no need to know some very particular details about how f-string interpolation actually works. We have simplicity: we turn something into a str, then substitute it into another str.

Taking a look at the link, the justification for this lint rule is that str is unnecessary:

Note that, in many cases, calling str() within an f-string is unnecessary and can be removed entirely, as the value will be converted to a string automatically,

However, they immediately point out that in fact that "str is unnecessary" is a rule. There's an exception that they point out:

the notable exception being for classes that implement a custom format method.

My worry now is this -- can we guarantee that this conversion is perfect and bug free?

How is ruff going to know, with 0 chance of getting it wrong, that, for some non-str value x, f"{str(x)}" is equivalent to f"{x}"?

I do not think this is possible. So in my mind, this automatic conversion runs the risk of introducing a bug. This risk isn't justified by the benefits of automatically converting code to use this style.

@@ -36,29 +37,30 @@ class Label2IDTokenizer(TokenizerSpec):

"""

def __init__(
def __init__( # noqa: D107
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a # TODO add docstring and remove #noqa: D107 to a few of these? I had to google what D107 was, and we should really remove these and add documentation. Alternatively, create a ticket to remove all D017 noqa and add docstrings to all functions that are missing them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely create a github issue to track. I could do a find/replace to add in that additional TODO; but these noqa directives were added auto-magically by ruff

return list(text)

def tokens_to_text(self, tokens):
def tokens_to_text(self, tokens): # noqa: D102
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add tickets for D102 fixing/removal.

@@ -22,7 +22,7 @@
PrecisionTypes = Literal["fp16", "bf16", "fp32", "bf16-mixed", "fp32-mixed", "16-mixed", "fp16-mixed", 16, 32]


def get_autocast_dtype(precision: PrecisionTypes) -> torch.dtype:
def get_autocast_dtype(precision: PrecisionTypes) -> torch.dtype: # noqa: D103
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for D103

@@ -23,7 +23,7 @@
class WandbLoggerOptions(TypedDict):
"""Note: `name` controls the exp name is handled by the NeMoLogger so it is ommitted here.
`directory` is also omitted since it is set by the NeMoLogger.
"""
""" # noqa: D205
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and D205. Basically my ask is that we should have "actually do the fix" tickets/issues for most of these at least.

Copy link
Collaborator

@jstjohn jstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ask is you make github issues to fix each style of noqa (not each instance to be clear), with a brief explanation of what that noqa is in the task name. For example for D107 the task name should be like "Add docstrings to all __init__ with missing docstrings" and in the description it should say "look for all instances of noqa D107 in code, remove, and add docstrings." Same for the other high level noqas you added.

@pstjohn
Copy link
Collaborator Author

pstjohn commented Jul 31, 2024

Will do! I'll wait to make sure this merges before creating them

Copy link
Collaborator

@jomitchellnv jomitchellnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

line-length = 119

[tool.ruff.lint]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pstjohn pstjohn enabled auto-merge (squash) July 31, 2024 20:38
@malcolmgreaves malcolmgreaves self-requested a review July 31, 2024 20:48
@pstjohn
Copy link
Collaborator Author

pstjohn commented Jul 31, 2024

/build-ci

@@ -222,10 +222,10 @@ def download_artifacts(
ngc_dirname = complete_download_dir / ngc_dirname

# TODO: this assumes that it's a model for now.
command = f"mkdir -p {str(complete_download_dir)} && {ngc_call_command} registry model download-version {artifact_source_path} --dest {str(complete_download_dir)} && mv {str(ngc_dirname)}/* {str(complete_download_dir)}/ && rm -d {str(ngc_dirname)}"
command = f"mkdir -p {complete_download_dir} && {ngc_call_command} registry model download-version {artifact_source_path} --dest {complete_download_dir} && mv {ngc_dirname}/* {complete_download_dir}/ && rm -d {ngc_dirname}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, this is kinda bad. I assume that there's a pylint rule that is removing the explicit str conversion here? F string interpolation does call __str__ on all values, which is what str does. But I don't understand why one would want to make something that's explicit implicit? 🤔

I suppose we're getting away from the Zen of Python now!
https://peps.python.org/pep-0020/


root_directory: Optional[str] = RemoteResource.get_env_tmpdir()
root_directory: Optional[str] = RemoteResource.get_env_tmpdir() # noqa: RUF009
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow! RUF009 is actually excellent here -- this is an example of something that is actually a bug that should be fixed! Each time we construct a new ResourcePreprocessor -- or any child class of it -- we are going to be re-using the same single, globally defined value returned by this function.

Instead of adding this ignore comment, can you instead change it to use a field w/ this function as the factory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue tracked in #57, PR to fix in #58

input_ids: Tensor,
position_ids: Tensor,
tokentype_ids: int | None = None,
attention_mask: Tensor | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-fixed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually I fixed these manually. There was a complaint about not implicitly setting optional args through int = None

if Path(tar_file).is_file():
with tarfile.open(tar_file) as tar:
extract_path = f"{str(complete_download_dir)}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually like to make a case for having RUF010 ignored.

From the link:

a = "some string"
f"{a!r}"

I think this is weird and significantly less readable as compared to:

a = "some string"
f"{repr(a)}"

With the former, I have to know about new, special syntax. This f-string syntax is quite rare -- I have been reading and writing a lot of Python for the last 6+ years and I have never seen this before. I suspect that most other folks have also never seen this. It would, at the very least then, require someone to stop and go and try and look up what !r means.

Compare this with repr or str. Instead of having to learn new syntax, we instead can combine our pre-existing knowledge about these built-in functions with what we already know about f-strings.

if Path(tar_file).is_file():
with tarfile.open(tar_file) as tar:
extract_path = f"{str(complete_download_dir)}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the explicit nature perspective.

We are not relying on any implicit behavior here -- there's no need to know some very particular details about how f-string interpolation actually works. We have simplicity: we turn something into a str, then substitute it into another str.

Taking a look at the link, the justification for this lint rule is that str is unnecessary:

Note that, in many cases, calling str() within an f-string is unnecessary and can be removed entirely, as the value will be converted to a string automatically,

However, they immediately point out that in fact that "str is unnecessary" is a rule. There's an exception that they point out:

the notable exception being for classes that implement a custom format method.

My worry now is this -- can we guarantee that this conversion is perfect and bug free?

How is ruff going to know, with 0 chance of getting it wrong, that, for some non-str value x, f"{str(x)}" is equivalent to f"{x}"?

I do not think this is possible. So in my mind, this automatic conversion runs the risk of introducing a bug. This risk isn't justified by the benefits of automatically converting code to use this style.

@pstjohn pstjohn merged commit 467d7f1 into v2-main Jul 31, 2024
2 checks passed
@pstjohn
Copy link
Collaborator Author

pstjohn commented Jul 31, 2024

Apologies for missing the comments before it auto-merged -- but we discussed this via slack and it should be fine: RUF010, the rule we were discussing, is actually ignored in the pyproject.toml. I think the confusion here was that before it ignored it, it auto-fixed those lines; but in reverting them I took out those redundant str() casts.

@malcolmgreaves
Copy link
Collaborator

No worries! Thanks for the discussion. My bad for not reading thoroughly enough what RUF010 does 😅

pstjohn added a commit that referenced this pull request Jul 31, 2024
@pstjohn pstjohn deleted the pstjohn/v2-main/ruff-docstring branch September 16, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants