Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darker can bisect context lines without a reason #204

Open
rogalski opened this issue Sep 23, 2021 · 6 comments
Open

darker can bisect context lines without a reason #204

rogalski opened this issue Sep 23, 2021 · 6 comments
Assignees
Labels
performance Speed or memory usage improvement
Milestone

Comments

@rogalski
Copy link
Contributor

This bug is based on my experiments with darker on closed-source repo.
I am not at liberty to disclose any code. We'll have to work on recreating failure criteria on our own.

Steps to reproduce

  1. Create a file with mixed tabs-and-spaces (it have to trigger parsing file in 2.7 mode)
  2. Reformat whole file to build chunks list
  3. Observe that reformatted file is not AST equivalent to input file.

If initial reformat do not produce equivalent AST, all of bisection steps will fail anyway.

@akaihola
Copy link
Owner

Thanks @rogalski!

I created mixed.py (had to rename to .txt to attach here) which contains mixed tabs and spaces.

If I run Black on it, I get this:

$ black --diff mixed.py   
error: cannot format mixed.py:
INTERNAL ERROR:
Black produced code that is not equivalent to the source on pass 1.
Please report a bug on https://github.com/psf/black/issues.
This diff might be helpful: /tmp/blk_e5me14ox.log
Oh no! 💥 💔 💥
1 file would fail to reformat.

Here's blk_e5me14ox.log from Black.

So I presume this is what happens when Darker processes this file as well. The call to black.format_str() succeeds, but the AST verification fails. In this situation, Darker has no choice but to assume that this isn't due to Black but because Darker's own diff-matching algorithm failed to work correctly due to complex and ambiguous edits in the file. It then proceeds to expanding the edited chunks in order to apply fewest possible extra reformattings which preserve the AST.

I've seen AST verification failures happen on real-life files which don't contain mixed tabs and spaces, and the chunk expansion algorithm is a valid work-around for those situations.

Could we short-circuit that mechanism here? How do we detect that it's no use even trying to find a minimal set of extra reformattings?

@akaihola akaihola added duplicate This issue or pull request already exists performance Speed or memory usage improvement question Further information is requested and removed duplicate This issue or pull request already exists labels Sep 24, 2021
@rogalski
Copy link
Contributor Author

rogalski commented Sep 24, 2021

My intuitive argumentation was following:

  • darker is just diff-based black
  • if black crashes on input file and bails out, darker can (and should) bail out as well.

This of course is design decision more than actual bug, yet it vastly simplifies implementation and corner case handling while keeping things relatively straightforward for customer.

@akaihola
Copy link
Owner

Darker directly calls black.format_str() which doesn't do AST verification and thus succeeds in our example case. Darker then calls darker.verification.verify_ast_unchanged(), and acts based on the result of that.

What we could add is an extra step:
If AST verification fails when applying reformatting strictly to modified lines only, Darker could try to verify the result with all lines reformatted. If that fails, Darker would bail out early and display an error. If it passes, Darker would continue with bisection as before. Does this match what you have in mind?

I drafted a diagram showing the added new behavior:

@rogalski
Copy link
Contributor Author

Sounds good to me 👍

@akaihola akaihola added this to the 1.3.2 milestone Oct 5, 2021
@akaihola akaihola modified the milestones: 1.3.2, 1.4.0 Oct 28, 2021
@akaihola akaihola self-assigned this Oct 31, 2021
@akaihola akaihola removed the question Further information is requested label Oct 31, 2021
@akaihola akaihola modified the milestones: 1.4.0, 1.4.1, 1.4.2 Feb 8, 2022
@akaihola akaihola modified the milestones: 1.4.2, 1.5.0 Feb 18, 2022
@akaihola
Copy link
Owner

Moving this to milestone 1.5.0, there are a plenty of bugfixes and documentation improvements coming up for a 1.4.2 bugfix release already.

@akaihola
Copy link
Owner

@rogalski & @overratedpro, is there any chance you could review if I start to implement this for release 1.5.0?

@akaihola akaihola modified the milestones: 1.5.0, 1.5.1 Apr 5, 2022
@akaihola akaihola modified the milestones: 1.5.1, 1.6.0 Sep 13, 2022
@akaihola akaihola modified the milestones: 1.6.0, 1.6.1 Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Speed or memory usage improvement
Projects
Development

No branches or pull requests

2 participants