Add some additional ruff checks, ignoring existing violations #56

pstjohn · 2024-07-31T19:54:01Z

As we move towards automatically generated API documentation, it would be helpful if we had pydocstyle linting as part of our ruff checks. This PR refactors the pyproject.toml file slightly, and runs --add-noqa to explicitly ignore exisiting violations. https://docs.astral.sh/ruff/tutorial/#adding-rules

We can come back and fix these as we go

pstjohn · 2024-07-31T19:54:21Z

/build-ci

jstjohn · 2024-07-31T20:08:14Z

pyproject.toml

 line-length = 119

+[tool.ruff.lint]
+ignore = ["C901", "E741", "E501", "D100", "RUF005", "RUF010"]
+select = ["C", "E", "F", "I", "W",


aren't you adding this line 2x, eg "C" appears twice. I prefer the second instance which is on its own line and has a comment.

yup! whoops, good catch :D. Fixed!

jstjohn · 2024-07-31T20:10:29Z

scripts/download_artifacts.py

                if Path(tar_file).is_file():
                    with tarfile.open(tar_file) as tar:
-                        extract_path = f"{str(complete_download_dir)}"


be careful! This might break things if the dir is a Path. I think it might repr as a Path('/some/path') rather than '/some/path' without the str. Maybe a compromise would be to create new variables that are explicitly converted to str earlier on? Also maybe you should change the type on these variables to Union[str, Path] so that the checker knows that it's not pointless to wrap them in str?

In [1]: from pathlib import Path In [2]: path_ex = Path("/some/path") In [3]: path_ex_str = str(path_ex) In [4]: print(f"{path_ex}") /some/path In [5]: print(f"{path_ex_str}") /some/path

Actually it seems okay with f strings. I forget where I remember seeing str vs Path issues.

f-string use str, not repr, so this should be fine. But yeah, this was me simplifying things to avoid RUF010

>>> test = Path('/my/test/path/') >>> print(f"{test}") /my/test/path

I'd actually like to make a case for having RUF010 ignored.

From the link:

a = "some string" f"{a!r}"

I think this is weird and significantly less readable as compared to:

a = "some string" f"{repr(a)}"

With the former, I have to know about new, special syntax. This f-string syntax is quite rare -- I have been reading and writing a lot of Python for the last 6+ years and I have never seen this before. I suspect that most other folks have also never seen this. It would, at the very least then, require someone to stop and go and try and look up what !r means.

Compare this with repr or str. Instead of having to learn new syntax, we instead can combine our pre-existing knowledge about these built-in functions with what we already know about f-strings.

There's also the explicit nature perspective.

We are not relying on any implicit behavior here -- there's no need to know some very particular details about how f-string interpolation actually works. We have simplicity: we turn something into a str, then substitute it into another str.

Taking a look at the link, the justification for this lint rule is that str is unnecessary:

Note that, in many cases, calling str() within an f-string is unnecessary and can be removed entirely, as the value will be converted to a string automatically,

However, they immediately point out that in fact that "str is unnecessary" is a rule. There's an exception that they point out:

the notable exception being for classes that implement a custom format method.

My worry now is this -- can we guarantee that this conversion is perfect and bug free?

How is ruff going to know, with 0 chance of getting it wrong, that, for some non-str value x, f"{str(x)}" is equivalent to f"{x}"?

I do not think this is possible. So in my mind, this automatic conversion runs the risk of introducing a bug. This risk isn't justified by the benefits of automatically converting code to use this style.

jstjohn · 2024-07-31T20:15:18Z

sub-packages/bionemo-contrib/src/bionemo/contrib/tokenizer/label2id_tokenizer.py

@@ -36,29 +37,30 @@ class Label2IDTokenizer(TokenizerSpec):

    """

-    def __init__(
+    def __init__(  # noqa: D107


Maybe add a # TODO add docstring and remove #noqa: D107 to a few of these? I had to google what D107 was, and we should really remove these and add documentation. Alternatively, create a ticket to remove all D017 noqa and add docstrings to all functions that are missing them?

I can definitely create a github issue to track. I could do a find/replace to add in that additional TODO; but these noqa directives were added auto-magically by ruff

jstjohn · 2024-07-31T20:16:07Z

sub-packages/bionemo-contrib/src/bionemo/contrib/tokenizer/label2id_tokenizer.py

        return list(text)

-    def tokens_to_text(self, tokens):
+    def tokens_to_text(self, tokens):  # noqa: D102


Also add tickets for D102 fixing/removal.

jstjohn · 2024-07-31T20:16:21Z

sub-packages/bionemo-contrib/src/bionemo/contrib/utils/dtypes.py

@@ -22,7 +22,7 @@
 PrecisionTypes = Literal["fp16", "bf16", "fp32", "bf16-mixed", "fp32-mixed", "16-mixed", "fp16-mixed", 16, 32]


-def get_autocast_dtype(precision: PrecisionTypes) -> torch.dtype:
+def get_autocast_dtype(precision: PrecisionTypes) -> torch.dtype:  # noqa: D103


Same for D103

jstjohn · 2024-07-31T20:16:59Z

sub-packages/bionemo-contrib/src/bionemo/contrib/utils/logger_utils.py

@@ -23,7 +23,7 @@
 class WandbLoggerOptions(TypedDict):
    """Note: `name` controls the exp name is handled by the NeMoLogger so it is ommitted here.
    `directory` is also omitted since it is set by the NeMoLogger.
-    """
+    """  # noqa: D205


and D205. Basically my ask is that we should have "actually do the fix" tickets/issues for most of these at least.

jstjohn

My ask is you make github issues to fix each style of noqa (not each instance to be clear), with a brief explanation of what that noqa is in the task name. For example for D107 the task name should be like "Add docstrings to all __init__ with missing docstrings" and in the description it should say "look for all instances of noqa D107 in code, remove, and add docstrings." Same for the other high level noqas you added.

pstjohn · 2024-07-31T20:21:17Z

Will do! I'll wait to make sure this merges before creating them

jomitchellnv

Looks good to me.

jomitchellnv · 2024-07-31T20:38:16Z

pyproject.toml

 line-length = 119

+[tool.ruff.lint]


pstjohn · 2024-07-31T21:02:22Z

/build-ci

malcolmgreaves · 2024-07-31T20:52:07Z

scripts/download_artifacts.py

@@ -222,10 +222,10 @@ def download_artifacts(
            ngc_dirname = complete_download_dir / ngc_dirname

            # TODO: this assumes that it's a model for now.
-            command = f"mkdir -p {str(complete_download_dir)} && {ngc_call_command} registry model download-version {artifact_source_path} --dest {str(complete_download_dir)} && mv {str(ngc_dirname)}/* {str(complete_download_dir)}/ && rm -d {str(ngc_dirname)}"
+            command = f"mkdir -p {complete_download_dir} && {ngc_call_command} registry model download-version {artifact_source_path} --dest {complete_download_dir} && mv {ngc_dirname}/* {complete_download_dir}/ && rm -d {ngc_dirname}"


Wait, this is kinda bad. I assume that there's a pylint rule that is removing the explicit str conversion here? F string interpolation does call __str__ on all values, which is what str does. But I don't understand why one would want to make something that's explicit implicit? 🤔

I suppose we're getting away from the Zen of Python now!
https://peps.python.org/pep-0020/

malcolmgreaves · 2024-07-31T20:54:29Z

sub-packages/bionemo-contrib/src/bionemo/contrib/data/preprocess.py


-    root_directory: Optional[str] = RemoteResource.get_env_tmpdir()
+    root_directory: Optional[str] = RemoteResource.get_env_tmpdir()  # noqa: RUF009


Oh wow! RUF009 is actually excellent here -- this is an example of something that is actually a bug that should be fixed! Each time we construct a new ResourcePreprocessor -- or any child class of it -- we are going to be re-using the same single, globally defined value returned by this function.

Instead of adding this ignore comment, can you instead change it to use a field w/ this function as the factory?

Issue tracked in #57, PR to fix in #58

malcolmgreaves · 2024-07-31T20:56:03Z

sub-packages/bionemo-contrib/src/bionemo/contrib/model/esm2/embedding.py

+        input_ids: Tensor,
+        position_ids: Tensor,
+        tokentype_ids: int | None = None,
+        attention_mask: Tensor | None = None,


Auto-fixed?

No, actually I fixed these manually. There was a complaint about not implicitly setting optional args through int = None

malcolmgreaves · 2024-07-31T21:13:11Z

scripts/download_artifacts.py

                if Path(tar_file).is_file():
                    with tarfile.open(tar_file) as tar:
-                        extract_path = f"{str(complete_download_dir)}"


I'd actually like to make a case for having RUF010 ignored.

From the link:

a = "some string" f"{a!r}"

I think this is weird and significantly less readable as compared to:

a = "some string" f"{repr(a)}"

With the former, I have to know about new, special syntax. This f-string syntax is quite rare -- I have been reading and writing a lot of Python for the last 6+ years and I have never seen this before. I suspect that most other folks have also never seen this. It would, at the very least then, require someone to stop and go and try and look up what !r means.

Compare this with repr or str. Instead of having to learn new syntax, we instead can combine our pre-existing knowledge about these built-in functions with what we already know about f-strings.

malcolmgreaves · 2024-07-31T21:17:09Z

scripts/download_artifacts.py

                if Path(tar_file).is_file():
                    with tarfile.open(tar_file) as tar:
-                        extract_path = f"{str(complete_download_dir)}"


There's also the explicit nature perspective.

We are not relying on any implicit behavior here -- there's no need to know some very particular details about how f-string interpolation actually works. We have simplicity: we turn something into a str, then substitute it into another str.

Taking a look at the link, the justification for this lint rule is that str is unnecessary:

Note that, in many cases, calling str() within an f-string is unnecessary and can be removed entirely, as the value will be converted to a string automatically,

However, they immediately point out that in fact that "str is unnecessary" is a rule. There's an exception that they point out:

the notable exception being for classes that implement a custom format method.

My worry now is this -- can we guarantee that this conversion is perfect and bug free?

How is ruff going to know, with 0 chance of getting it wrong, that, for some non-str value x, f"{str(x)}" is equivalent to f"{x}"?

I do not think this is possible. So in my mind, this automatic conversion runs the risk of introducing a bug. This risk isn't justified by the benefits of automatically converting code to use this style.

pstjohn · 2024-07-31T21:30:11Z

Apologies for missing the comments before it auto-merged -- but we discussed this via slack and it should be fine: RUF010, the rule we were discussing, is actually ignored in the pyproject.toml. I think the confusion here was that before it ignored it, it auto-fixed those lines; but in reverting them I took out those redundant str() casts.

malcolmgreaves · 2024-07-31T21:34:28Z

No worries! Thanks for the discussion. My bad for not reading thoroughly enough what RUF010 does 😅

…#56)" This reverts commit 467d7f1.

pstjohn added 4 commits July 31, 2024 12:34

change ruff in pyproject.toml

e7a3866

autofix where possible and add noqa

74829c5

add RUF rule and fix/ignore where violating

8ca1f33

some minor cleanup

50c6aa6

pstjohn requested review from jomitchellnv and jstjohn July 31, 2024 19:54

pstjohn self-assigned this Jul 31, 2024

jstjohn reviewed Jul 31, 2024

View reviewed changes

fixing pyproject.toml

cd2edad

jstjohn reviewed Jul 31, 2024

View reviewed changes

jstjohn approved these changes Jul 31, 2024

View reviewed changes

jomitchellnv approved these changes Jul 31, 2024

View reviewed changes

pyproject.toml

line-length = 119

[tool.ruff.lint]

Copy link

Collaborator

jomitchellnv Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pstjohn enabled auto-merge (squash) July 31, 2024 20:38

malcolmgreaves self-requested a review July 31, 2024 20:48

malcolmgreaves reviewed Jul 31, 2024

View reviewed changes

pstjohn merged commit 467d7f1 into v2-main Jul 31, 2024
2 checks passed

pstjohn mentioned this pull request Jul 31, 2024

Fix RUF009 violation in data/preprocess.py #57

Open

pstjohn added a commit that referenced this pull request Jul 31, 2024

Revert "Add some additional ruff checks, ignoring existing violations (…

dba200a

…#56)" This reverts commit 467d7f1.

pstjohn mentioned this pull request Jul 31, 2024

Revert "Add some additional ruff checks, ignoring existing violations" #60

Closed

pstjohn deleted the pstjohn/v2-main/ruff-docstring branch September 16, 2024 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some additional ruff checks, ignoring existing violations #56

Add some additional ruff checks, ignoring existing violations #56

pstjohn commented Jul 31, 2024

pstjohn commented Jul 31, 2024

jstjohn Jul 31, 2024

pstjohn Jul 31, 2024

jstjohn Jul 31, 2024

jstjohn Jul 31, 2024

pstjohn Jul 31, 2024

malcolmgreaves Jul 31, 2024

malcolmgreaves Jul 31, 2024

jstjohn Jul 31, 2024

pstjohn Jul 31, 2024

jstjohn Jul 31, 2024

jstjohn Jul 31, 2024

jstjohn Jul 31, 2024

jstjohn left a comment •

edited

Loading

pstjohn commented Jul 31, 2024

jomitchellnv left a comment

jomitchellnv Jul 31, 2024

pstjohn commented Jul 31, 2024

malcolmgreaves Jul 31, 2024

malcolmgreaves Jul 31, 2024

pstjohn Jul 31, 2024

malcolmgreaves Jul 31, 2024

pstjohn Jul 31, 2024

malcolmgreaves Jul 31, 2024

malcolmgreaves Jul 31, 2024

pstjohn commented Jul 31, 2024

malcolmgreaves commented Jul 31, 2024


		root_directory: Optional[str] = RemoteResource.get_env_tmpdir()
		root_directory: Optional[str] = RemoteResource.get_env_tmpdir() # noqa: RUF009

Add some additional ruff checks, ignoring existing violations #56

Add some additional ruff checks, ignoring existing violations #56

Conversation

pstjohn commented Jul 31, 2024

pstjohn commented Jul 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jstjohn left a comment • edited Loading

Choose a reason for hiding this comment

pstjohn commented Jul 31, 2024

jomitchellnv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstjohn commented Jul 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstjohn commented Jul 31, 2024

malcolmgreaves commented Jul 31, 2024

jstjohn left a comment •

edited

Loading