Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a pre-commit hook to update get_info #55

Merged
merged 39 commits into from
Aug 20, 2024

Conversation

Schefflera-Arboricola
Copy link
Member

@Schefflera-Arboricola Schefflera-Arboricola commented Mar 21, 2024

What does this PR currently do?

Adds a pre-commit hook that updates the dictionary returned by the get_info function.

What this PR initially did?

Solves the issue of nx-parallel's info not appearing in the "Additional backend implementation" box(ref. ss):
Screenshot 2024-03-22 at 1 15 56 AM

WIP : running pre-commit updates the get_info but it also fails every time

@dschult
Copy link
Member

dschult commented Mar 22, 2024

Yay for getting the doc_strings to show in the NX documentation!

Does it fail because it changes the files? If so, after running it once the files should be changed -- so git add and then try to commit again should work. I guess that is an extra step -- but does it work?

@Schefflera-Arboricola
Copy link
Member Author

Schefflera-Arboricola commented Mar 22, 2024

the hook updates the __init__.py every time the pre-commit is run but we get into the loop of “git add .” and “pre-commit”, and the ruff-format and update function info hooks fail alternatively, as shown below:

(base) aditi@Aditis-MacBook-Air nx-parallel % pre-commit
Update function info.....................................................Passed
blacken-docs.............................................................Passed
prettier.................................................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Failed
- hook id: ruff-format
- files were modified by this hook
1 file reformatted
(base) aditi@Aditis-MacBook-Air nx-parallel % git add . 
(base) aditi@Aditis-MacBook-Air nx-parallel % pre-commit
Update function info.....................................................Failed
- hook id: update-get_info
- files were modified by this hook
blacken-docs.........................................(no files to check)Skipped
prettier.................................................................Passed
ruff.................................................(no files to check)Skipped
ruff-format..........................................(no files to check)Skipped
(base) aditi@Aditis-MacBook-Air nx-parallel % 

And running black on init.py just after the update in the same hook, didn't seem to resolve this. A solution to this might be to have a bash script that only updates the get_info when there’s any actual difference.

@dschult
Copy link
Member

dschult commented Mar 26, 2024

Yes, you have a loop:

  1. update-get_info writes code that is not formatted nicely.
  2. ruff formats that code (though it still is not formatted with triple quotes and line-breaks to make it more readable)

Each makes the file init.py look different. So one of the two will not be satisfied.

Looks like you could make update-get_info create a well formatted file.
Or you could tell ruff to ignore that file (but then the file isn't human readable).

A kind of a cheat is to have update-get_info run ruff against the __init__.py file after writing to it. I have push that up to this branch. I don't know a good way to use python to write python code that is formatted well. We would need a way to run ruff from within python on a string instead of running it on an existing file.

@Schefflera-Arboricola
Copy link
Member Author

Schefflera-Arboricola commented Mar 29, 2024

I added a script that would update the __init__.py only if there are any updates. However, there is a problem that repo: local in the pre-commit hook will refer to the main branch of the main nx-parallel repository when the lint workflow is being executed in gh workflows and not the PR's branch of the forked repo. So, if someone makes some changes to the docs and then runs their pre-commit locally and sees it's passing for all hooks, and then they will submit a PR and their style/format test would fail here on GitHub(under the PR) because pre-commit is not running on their PR's branch(it's running on nx-parallel's main). One way to resolve this might be to clone/download the forked repo in lint.yml and then run pre-commit on that. But, to clone the contributor's fork every time lint workflow tests are run doesn't seem very efficient to me.

Also, I think if we would want to merge a non-upto-date branch(that doesn't even update any docs) in the future then we might face unnecessary merge conflicts in the __init__.py file. So, I think I need to think about this a bit more, like, if we should even put update-get-info in pre-commit. For now, I've created a separate PR that adds the script to update the get_info and for other changes. (@dschult )And the ideal flow for merging/reviewing them would be:

PR #56 ------(rerun the update script in PR #57 )------> PR #57 ------> PR #55

@dschult
Copy link
Member

dschult commented May 9, 2024

I've been thinking more about how to make this simple -- and yet still work.
After some playing around I was able to get your script to create python code that ruff accepts without changes. The approach created indented dict structures in text form. So rather than convert the whole func dict in one str function, we can construct a string inside get_funcs_info with human-readable indentation and use that to create the get_info function.

That would short-cut the cycle of update_get_info -- ruff -- update_get_info...

Your comment about having lots of merge conflicts is certainly true. Though there might not be as many as you think because any new functions or doc_string changes would occur in a different place in the file. So only if the changes happened to be adjacent would there be a merge conflict.

I'm think I will go ahead and reopen this PR and try the changes as follows:
Replace return funcs with

    indent = "\n" + " " * 12
    out = "{"
    for func, finfo in funcs.items():
        out += indent + f'"{func}": {{' + indent + f'    "url": "{finfo["url"]}",'
        out += indent + f'    "additional_docs": "{finfo["additional_docs"]}",'
        out += (
            indent
            + f'    "additional_parameters": """{finfo["additional_parameters"]}""",'
        )
        out += indent + "},"
    out += "\n        },\n    }\n"
    return out

And then at the end of the file, take out the close bracket, and no str needed:

    f.write(string + get_funcs_info())

@dschult dschult reopened this May 9, 2024
@dschult dschult force-pushed the precommit_get_info branch from d083d92 to 14243d1 Compare May 9, 2024 02:38
@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as ready for review August 6, 2024 12:52
@dschult
Copy link
Member

dschult commented Aug 8, 2024

I'm looking to review this PR about using pre-commit to update get_info.
The PR diff looks very simple. There doesn't seem to be any change in the script file.
The only substantial change I can see is that the get_funcs_info now sorts by the function name.

Can you describe what this PR does and how it helps update the pre-commit while avoiding the circular conflicts we were having before?

@Schefflera-Arboricola
Copy link
Member Author

Recently, when I was revisiting this PR, I observed that the style test was failing on Github(but passing locally) because, for some reason, the ordering of the functions in the get_info's returned dictionary was not the same for my local machine and for the GitHub CI here. I thought it was because of some version inconsistency, or the ordering of the pre-commit hooks, or any recent updates in pre-commit, ruff, black or python. But none of these were the case, as indicated by the commits above. However, I ended up adding the versions explicitly for all the libraries. 

To figure this out, I dug a layer deeper and tried to see if there was any inconsistency in the update_get_info.py script. And I quickly realised that the ordering of the functions was differing because the order in which my machine and the order in which GitHub CI were extracting the files and functions from the nx_parallel module were different. So, I first tried sorting the files extracted and then the functions in those files, but still it was failing because some files have the same names, like connectivity.py, which exists in both the approximation and the connectivity modules. So, I decided to sort the returned function dictionary instead.

Please LMK if you have any further questions. Thank you :)

Comment on lines +62 to +64
expr.value
for expr in node.value.elts
if isinstance(expr, ast.Constant)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially made this change in PR #68 (ref. https://github.com/networkx/nx-parallel/pull/68/files#r1689581947)

Added this here also because it seems more relevant to this PR as well.

@Schefflera-Arboricola
Copy link
Member Author

@dschult Please LMK if it would be possible to get this PR also reviewed and merged before GSoC ends(on August 26th) based on your schedule, so that I can accordingly mention it in my GSoC final report. I'm open to scheduling a meeting to discuss this PR if that would help you review this better and faster.

Thank you :)

@dschult
Copy link
Member

dschult commented Aug 20, 2024

Yes -- let's get this PR finished up and merged this week.

It looks like it works now (with the function names being sorted). And it appears to work on CI too. Is there more to do, or is it ready to merge as far as you are concerned?

Some questions:
Did you consider using Python's inspect features rather than walking through the source tree? It has functions like inspect.getdoc and tools like for mod in inspect.getmembers(nx_parallel, inspect.ismodule):. I'm not sure of the tradeoffs between walking the source tree and walking the imported module. Just wondering if you had tried it and decided the source tree is better.

:)

@Schefflera-Arboricola
Copy link
Member Author

It looks like it works now (with the function names being sorted). And it appears to work on CI too. Is there more to do, or is it ready to merge as far as you are concerned?

Looks ready to me :)

Some questions: Did you consider using Python's inspect features rather than walking through the source tree? It has functions like inspect.getdoc and tools like for mod in inspect.getmembers(nx_parallel, inspect.ismodule):. I'm not sure of the tradeoffs between walking the source tree and walking the imported module. Just wondering if you had tried it and decided the source tree is better.

I don't think I tried using inspect or anything else when I added the update_get_info.py script in (I think) PR #53 . So, I'm also not sure which one is a better choice but I don't think this is a blocker here and we should probably handle this in a separate PR. But, it's good to have this noted somewhere. LMK what you think.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ready to me! Thanks @Schefflera-Arboricola

We probably don't want the version numbers of pre-commit and ruff to be pinned, right?
And do we want to use python3 instead of python in script.sh?
You know the answers better than I do I think for those questions.

@Schefflera-Arboricola
Copy link
Member Author

We probably don't want the version numbers of pre-commit and ruff to be pinned, right?
And do we want to use python3 instead of python in script.sh?

All of this was made explicit in the process of trying to figure out what exactly was causing the inconsistency in the order of the functions in get_info. And, I decided to keep them like this because explicit is better than implicit 😅

@dschult dschult merged commit c3d8b44 into networkx:main Aug 20, 2024
11 checks passed
@jarrodmillman jarrodmillman added this to the 0.3 milestone Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Possible approaches for automating the updation of the get_info function
3 participants