Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge decompyle3's 3.7 and 3.8 goodness into uncompyle6 #307

Open
rocky opened this issue Feb 22, 2020 · 13 comments
Open

Merge decompyle3's 3.7 and 3.8 goodness into uncompyle6 #307

rocky opened this issue Feb 22, 2020 · 13 comments
Assignees
Labels
Volunteer wanted Volunteer wanted to fix if a bug or to implement if a new feature.

Comments

@rocky
Copy link
Owner

rocky commented Feb 22, 2020

At this point probably 3.7 and 3.8 decompilation with its refactored parsing and reduction-rule checking is better than the code in uncompyle6.

Although there still is much to do in decompyle3, there is benefit in backporting it here.

Any volunteers?

@rocky rocky added the Volunteer wanted Volunteer wanted to fix if a bug or to implement if a new feature. label Feb 22, 2020
@x0ret
Copy link
Collaborator

x0ret commented Mar 20, 2020

Hi,
After a nearly long time i'm glad i can focus on this project again.
Anyway, i managed to backport decompyle3 parsing and reduction-rule checking here. However it seems to me i should review all learned lessons from past : ).

But there seems to be an issue and need to be verified. I tried to run make check-3.7, however some 3.0 tests seem to be fail. Any thing changed? I tried master branch and pip version FYI.

bytecode_3.0/00_import.pyc --
Instruction context:se_lambda.pyc -- decompiled 5 files: 5 okay, 0 failed

  19      30  LOAD_CONST               0
             33  STORE_FAST               'more'
             36  POP_EXCEPT
             37  JUMP_BACK             3  'to 3'
->         40_0  COME_FROM            23  '23'
             40  POP_TOP
             41  END_FINALLY
             42  JUMP_BACK             3  'to 3'

Thank you.

@x0ret x0ret self-assigned this Mar 20, 2020
@rocky
Copy link
Owner Author

rocky commented Mar 20, 2020

Great to see you back!

A lot has changed. It is hard to even know where to begin.

In decompyle3 I have been exclusively working in the "and-or-refactor" branch. which I am getting ready to merge back into master.

With that,

  • lambda expressions are now its own subgrammar of the full python grammar
  • directories for each Python version have been added - p37, p38 - for python version 3.7 and 3.8.
  • The reduction checks have been greatly expanded But see https://github.com/rocky/python-uncompyle6/wiki/The-Control-Flow-Mess
  • grammar patterns for looping constructs are distinct from non-looping constructs
  • as the title of that branch suggests and/or and more generally boolean condition treatment has been rethought
  • A lot of my testing focuses not just on parsing without errors but running the test suite for the Python version to find semantic errors. It is a lot tougher to do and a lot more stringent

As a result of this, I think now have a very good handle on fixing parsing bugs and being able to have a grammar that is more comprehensible, and understand when something needs to be fixed as a reduction rule or grammar rule or something else.

And all of this I think will be a prelude to yet another fork to really try to reduce the reduction rule complexity by adding more higher-level constructs that can be obtained with basic blocks and dominator, reverse dominator information.

It has and continues to be a long tough slog, but at the end I think I have a much better handle on decompilation for higher-level VMs. If I could get paid for this or there is a market for it, I might do this for some other pair of high-level language and assembly.

@rocky
Copy link
Owner Author

rocky commented Mar 20, 2020

Oh, and as for that specific error. Travis CI is not showing it and in my own locally-checked out master I am not seeing that error.

The filename on that is weird but assuming this is supposed to be bytecode_3.0/02_try_except_except.pyc

Here is the tree for that:

    whileTruestmt (4): ('%|while True:\n%+%c%-\n\n', 1)
         0. L.  15         0  SETUP_LOOP           45  'to 45'
         1. l_stmts
            try_except (5): ('%|try:\n%+%c%-%c\n\n', 1, 3)
                 0. L.  16         3  SETUP_EXCEPT         16  'to 16'
                 1. suite_stmts_opt
                    suite_stmts
                        assign (2): ('%|%c = %p\n', -1, (0, 200))
                             0. expr
                                L.  17         6  LOAD_CONST               1
                             1. store
                                               9  STORE_FAST               'more'
                 2.               12  POP_BLOCK        
                 3. except_handler (4)
                     0. jmp_abs
                                      13  JUMP_BACK             3  'to 3'
                     1.             16_0  COME_FROM_EXCEPT      3  '3'
                     2. except_stmts
                        except_stmt (2)
                             0. except_cond1 (7): ('%|except %c:\n', 1)
                                 0. L.  18        16  DUP_TOP          
                                 1. expr
                                                  17  LOAD_GLOBAL              KeyboardInterrupt
                                 2.               20  COMPARE_OP               exception-match
                                 3. jmp_false (3)
                                     0.               23  JUMP_IF_FALSE        40  'to 40'
                                     1. _come_froms
                                     2.               26  POP_TOP          
                                 4.               27  POP_TOP          
                                 5.               28  POP_TOP          
                                 6.               29  POP_TOP          
                             1. except_suite (3): ('%+%c%-%C', 0, (1, 9223372036854775807, ''))
                                 0. c_stmts
                                    assign (2): ('%|%c = %p\n', -1, (0, 200))
                                         0. expr
                                            L.  19        30  LOAD_CONST               0
                                         1. store
                                                          33  STORE_FAST               'more'
                                 1.               36  POP_EXCEPT       
                                 2. jump_except (3)
                                     0. _jump
                                                      37  JUMP_BACK             3  'to 3'
                                     1.             40_0  COME_FROM            23  '23'
                                     2.               40  POP_TOP          
                     3.               41  END_FINALLY      
                 4. opt_come_from_except
                    _come_froms
         2. jb_or_c
                          42  JUMP_BACK             3  'to 3'
         3.             45_0  COME_FROM_LOOP        0  '0'

@x0ret
Copy link
Collaborator

x0ret commented Mar 20, 2020

Oh, and as for that specific error. Travis CI is not showing it and in my own locally-checked out master I am not seeing that error.

Okay, let me try again. I'll let you know about the result.

In decompyle3 I have been exclusively working in the "and-or-refactor" branch. which I am getting ready to merge back into master.

So i think i should wait for it to be merged. Correct?

Anyway as of these progresses, what is your priorities in tasks, issues and parts? I think i can adapt myself to the new changes soon. (Maybe this is not a good place to discuss)

directories for each Python version have been added - p37, p38 - for python version 3.7 and 3.8.

And about merging changes, do you have plan to separate each version into directories like decompyle3?

Seems to be a lot of questions. Sorry about that!

@rocky
Copy link
Owner Author

rocky commented Mar 20, 2020

So i think i should wait for it to be merged. Correct?

Yes. I don't think it will take too long. In the meantime you can get up to speed on what's up in that branch.

Since this is a largish change, the way I'd handle it is to go over the commit history and pick out specific parts to do.

For example withsmt has been renamed to with to match Python's AST name.
There have been specific bug fixes that aren't related strictly to the grammar change that branch that could be applied to uncompyle. One example I recall is that as a result of adding in the docstring into the Parse Tree the order for where to look for an annotation changed by one.

That sort of weird thing was caught via running one of the stdlib tests for 3.7.

@rocky
Copy link
Owner Author

rocky commented Mar 25, 2020

@x0ret

and/or-refactor merging in decompiling may take a while. It looks like more rot was discovered and so yet another detour to fix that up is going on. However...

I've just squashed some of the commits in that branch and gone over things that should be ported here that are not part over the overall big refactor:

It looks like commit sha's have changed. I was able to match things up by commit message though.

https://github.com/rocky/python-decompile3/b592ebadce93ced49c72feaf37db8440531e9034
Fix bug in finding annotation in fn with docstring
Done in af8add9

https://github.com/rocky/python-decompile3/d6f4017587388952d1460e5eedfdfeab24c2918e
Better Async detection
Done in e2d349f But note more is needed for nested async for's. See
commented-out section in test/simple_source/bug37/02_async_for_generator.py. This works in decompile3.

https://github.com/rocky/python-decompile3/6097fc320a50a889b9cbdf6376a8673c28ed1257
https://github.com/rocky/python-decompile3/a91f17797cc126b37613758c57bc817a67b73e77
With grammar simplication and withstmt -> with (to match AST With)

https://github.com/rocky/python-decompile3/2fd65521feb8d9dc0d629da5501370875385b1f9
Set "await expr" precedence

https://github.com/rocky/python-decompile3/4d335816b366373b45d0968e70c16343c00914de
Handle remnants of unreachable with

https://github.com/rocky/python-decompile3/4c83809a8d6e9f17fd433c4287310455d5702b34
Handle nested async for

listcomp -> list_comp (to match AST ListComp)

Remove unneeded Makefiles in light of new "remake -c" option which does the
same thing and is more reliable.

@x0ret
Copy link
Collaborator

x0ret commented Mar 25, 2020

Thanks, you made my work easy!

Do you mind using separate commit for each?

I'll give it a try and let you know about the result.

@rocky
Copy link
Owner Author

rocky commented Mar 25, 2020

Thanks, you made my work easy!

You may want to pass judgement until afterwards. I wouldn't say "easy", but hopefully "easier".

Do you mind using separate commit for each?

Please do. And just work in the way that makes the most sense and is the most comfortable for you.

I'll give it a try and let you know about the result.

Many thanks!

About the diffs:

Changes strictly to 3.7-excludes.sh or 3.8-excludes, especially if it is only comments take with a grain of salt. There were regressions as well as progressions. Best is to just run those particular tests to see what is reported. I am sure you know this but as a reminder:

$ cd test/stdlib
$ ./runtest.sh test_[whatever].sh

What is helpful in the comment in the 3.x-exclude is a rough categorization:

  • The test doesn't work independent of uncompyle6; subcategories: it might fail or it might take a long time to run.
  • The test takes a long time just to decompile. This probably won't happen in uncompyle6, but it happens too often in decompyle3
  • There is a parse error in decompilation
  • The test takes too long to run the decompiled program. (Probably a control-flow bug)
  • The test errors - for example a variable is read that isn't set first, (often a sign of a control-flow-bug)
  • The test fails a test assertion.

Also I try to keep track of regressions by placing a newly-excluded test at the top. If this test worked sometime in the past that's great because I can compare the decompilled source which is of course less different than the original source and a faulty decompiled source. So regressions I generally put a the top of the exclude list since those are easiest to fix. And in the case of decompiyle3 once I get all the regressions fixed, I'll merge into master.

@x0ret
Copy link
Collaborator

x0ret commented Apr 2, 2020

I pushed some commits to 3.8-goodness branch and remove corresponding fixed tests from excludes. However please review those if you have time.

Also i saw your changes in commits for list_comp case but you didn't remove that from the list. Any reason?

Changes in that branch:

  • backporting with stmt parser.
  • backporting stmt reduction and a fix in docstring tranformation.

I'll continue working on this and then send a PR.

Thanks.

@rocky
Copy link
Owner Author

rocky commented Apr 2, 2020

I pushed some commits to 3.8-goodness branch and remove corresponding fixed tests from excludes. However please review those if you have time.

Yes, i noticed. Looks fine as far as I could tell form first glance. Nothing fancy. Thanks for doing this.

Also i saw your changes in commits for list_comp case but you didn't remove that from the list. Any reason?

I don't think this is complete. If you grep for n_listcomp you'll see that in there and I think there's some grammar rules with listcomp as well. I got to the stage where I would have to reconcile the two functions n_listcomp() and n_list_comp() and didn't have the energy to do. If you want to pick that up and run with it, great! If not, it can wait.

@x0ret
Copy link
Collaborator

x0ret commented Apr 4, 2020

I don't think this is complete if you grep for n_listcomp you'll see that in there and I think there's some grammar rules with listcomp as well. I got to the stage where I would have to reconcile the two functions n_listcomp() and n_list_comp() and didn't have the energy to do. If you want to pick that up and run with it, great! If not, it can wait.

I spent some time to figure out the listcomp case and i think i'm ready to change all the cases. However i ran into several issues and it seems parser need to be adjusted to cover cases such this:

async def x():
    q = [i for x in p async for i in f]
    return 1

In the meantime i reviewed pep530 and we should support more cases. 3.7 , 3.8 and decompile3 also is affected.

I'm going to continue this work and fix all the issues.

Anyway about these lines do you remember the test case for this? I assumed listcomp_async is here from previous attempts but the list_iter_index = 5 still is not happen to test cases i checked so far.

Thanks

@rocky
Copy link
Owner Author

rocky commented Apr 4, 2020

Thanks for undertaking to clean up listcomp and add pep530. Reminds me of the time you did the same for the function signatures which was a big mess that I doubt I would have gotten around to and fixed properly.

As for list_iter_index in n_listcomp I think what happened is that a redid the grammar to remove the massive duplication and to be able to cover more cases. At this point the assigment has no effect, and the code in comprehension_walk_newer() now has to picks out the right node since the node it needs is down in a child. In the situation where we are dealing with async lists, that parameter is no longer used, although since it is a parameter of the function, something has to go there. Perhaps set to None? or leave whatever value that might happen without the test.

This does bring up the broader point that syncronizing grammar rules with indexes in the semantics directory and the reducecheck directory is becoming a pain and it would be nice to have and use a more automated system to keep these in sync. Or at least more automated checks.

@rocky
Copy link
Owner Author

rocky commented Apr 4, 2020

Also, I see that in decompile3 I have two n_list_comp() routines, so I'll be deleting the first (and unused) one which seems to have code more specific to earlier Pythons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Volunteer wanted Volunteer wanted to fix if a bug or to implement if a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants