Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal operator priority pretty-printing in translators #69

Open
GreyCat opened this issue Dec 25, 2016 · 11 comments
Open

Optimal operator priority pretty-printing in translators #69

GreyCat opened this issue Dec 25, 2016 · 11 comments
Assignees
Milestone

Comments

@GreyCat
Copy link
Member

GreyCat commented Dec 25, 2016

Right now, generation of target code by translators yield more or less spontaneous parenthesis generation. In some cases, it proves to be unoptimal (i.e. extra parenthesis are added when there's no need for them) and sometimes it yields wrong results, i.e.:

A few examples that should probably provide more optimal translation:

  • 1 + 2
    • now yields (1 + 2)
    • should be 1 + 2
  • 1 + 2 + 5
    • now yields ((1 + 2) + 5)
    • should be 1 + 2 + 5
  • (1 + 2) / (7 * 8)
    • now yields ((1 + 2) / (7 * 8))
    • should be (1 + 2) / (7 * 8)

I propose to:

  1. Define formally the order of operators used in our expression language
  2. Fix translators in some way that would produce optimal translation for every target language; this is intrivial, as every language has somewhat different idea of operator priorities and availability.

Hopefully, it would also provide an ultimate answer to all these bugs above as well.

@GreyCat GreyCat self-assigned this Dec 25, 2016
@GreyCat
Copy link
Member Author

GreyCat commented Dec 30, 2016

It turns out that things are not as easy as they look like at the first glance. I've tried a naïve approach, introducing a "priority" integer that gets compared to that's inside. Literals get priority of 0, exponentiation gets 20, multiplication gets 40, addition gets 50, etc. Various translators for various target languages may output different values, as some languages have slightly different ideas on operator priorities. A very simple example ~5**2 gets evaluated differently:

  • in Ruby: (~5)**2 = 36
  • in Python: ~(5**2) = -25

However, things get ugly fast, as there's also operator's associativity. For example:

  • 1 - 2 - 3 gets parsed as ((1 - 2) - 3)
  • (1 - 2) - 3 gets parsed as ((1 - 2) - 3)
  • 1 - (2 - 3) gets parsed as (1 - (2 - 3))

Current priority-only based algorithm would translate all these examples to 1 - 2 - 3, as they're all of equal priority, and thus no parenthesis is needed. This is true for associative operations (i.e. (1 + 2) + 3 = 1 + (2 + 3), so it's somewhat safe to translate that to 1 + 2 + 3), but not true for operators like -, /, etc.

Moreover, I'm not totally sure that it's a good idea to omit parenthesis in 1 + (2 + 3) translation as well. It may be correct from arithmetic POV, but I'm not sure if it's a good idea to take so much liberty with original expression author's intent.

@koczkatamas
Copy link
Member

I would not omit parentheses as they can make the generated code more understandable when the author uses them to separate logically different parts.

@GreyCat
Copy link
Member Author

GreyCat commented Dec 30, 2016

I've started a comparison table on priorities in different languages:

https://docs.google.com/spreadsheets/d/13Cs3SidXHJVDCqydHCBwnk3uQ2FjGmHsjZa6SHkrE34/edit?usp=sharing

@GreyCat
Copy link
Member Author

GreyCat commented Dec 30, 2016

I would not omit parentheses as they can make the generated code more understandable when the author uses them to separate logically different parts.

Unfortunately, it's not completely possible. From the AST point of view:

  • (1 + 2) + 3
  • 1 + 2 + 3
  • ((((1 + 2)) + 3))
  • etc

are translated to exactly the same

BinOp(
  BinOp(
    IntNum(1),
    Add,
    IntNum(2)
  ),
  Add,
  IntNum(3)
)

After we've got AST, it's not possible to distinguish them.

@koczkatamas
Copy link
Member

koczkatamas commented Dec 30, 2016

They could be

  • (1 + 2) + 3
BinOp(
  Paren(
    BinOp(
      IntNum(1),
      Add,
      IntNum(2)
    )
  ),
  Add,
  IntNum(3)
)
  • 1 + 2 + 3
BinOp(
  BinOp(
    IntNum(1),
    Add,
    IntNum(2)
  ),
  Add,
  IntNum(3)
)
  • ((((1 + 2)) + 3))
Paren(
  Paren(
    BinOp(
      Paren(
        Paren(
          BinOp(
            IntNum(1),
            Add,
            IntNum(2)
          )
        )
      )
      Add,
      IntNum(3)
    )
  )
)

But I don't know whether this is the proper way to do it, or is it worth it at first place...

@GreyCat
Copy link
Member Author

GreyCat commented Dec 30, 2016

Well, it's generally not how AST parsing works. More or less, the general idea is to eliminate parenthesis and all these operator application priority steps, and that's actually what we need, as we can't rely on particular operator priorities (and parenthesis application) of a single language, as they can be different for different targets...

@koczkatamas
Copy link
Member

Yes, but we can interpret parentheses in ksy expressions as "force parenthesis here" operators. I cannot think of any case where these "forced" parentheses would change the meaning of the expression in any of the target languages.

Of course sometimes we need to put additional parentheses into the target language which won't be in the AST where the target language's operator precedence does not match with the .ksy expression precedence.

So the AST would still describe the same expression, but would add support for additional used-only-for-clarity-parentheses.

Or this could be even an optional compiler option: keep original parentheses from ksy or not.

GreyCat added a commit to kaitai-io/kaitai_struct_compiler that referenced this issue Jan 2, 2017
…ions (as discussed in kaitai-io/kaitai_struct#69):

* `translate()` family recursive functions now pass along not a single string, but string + priority integer
* There is a `translate(v, outerPriority)` wrapper that analyzes inner priority vs outer priority and adds parenthesis
* Fixed many `translate(...)` invocations to adhere to new pattern
* Ideally, `translate()` should be always called aware of the context the expression will be used as
* Introduced `CTernaryOperator` that carries the same logic of C-like ternary operator common to many languages
* Modified some TranslatorSpec tests

Work in progress, still has problems with associative operators.
@GreyCat
Copy link
Member Author

GreyCat commented Jan 2, 2017

I've pushed what I have now into distinct branch optimal_expressions, so anyone can track my progress.

@GreyCat
Copy link
Member Author

GreyCat commented Mar 25, 2024

kaitai-io/kaitai_struct_compiler#277 was merged, so I would continue using this issue to track remaining problems, this is roughly the plan:

  • comparison operators and boolean operators
  • ternary operator
  • integer operations with extreme integer constants
  • unary operators

I've started playing with comparisons, and immediately figured out that it's not so simple even with very basic set of comparison operators. Some languages have all of comparison operators on same level of precedence (e.g. Python), some languages have == and != as higher precedence (e.g. C++, Java, and many others).

A simple test: 1 < 2 == 3 < 4.

  • C++ (gcc11): 1
  • C++ (clang14): 1
  • Java: true
  • JavaScript: true
  • PHP: bool(true)
  • Python: False
  • Ruby: true

An also interesting bit is that modern gcc issues a warning on this:

expr_run.cpp: In function 'int main()':
expr_run.cpp:3:25: warning: suggest parentheses around comparison in operand of '==' [-Wparentheses]
    3 |         std::cout << (1 < 2 == 3 < 4) << std::endl;
      |                       ~~^~~

So, it looks like at least for Python (and likely for C++ too to avoid the warning) we need to modify generation to parenthesize. (1 < 2) == (3 < 4) yields correct "True" result in Python.

@GreyCat
Copy link
Member Author

GreyCat commented Mar 25, 2024

A few other data points:

  • Go blows up building 1 < 2 == 3 < 4 with the following error, parenthesized version works as expected (returning true):

    ./expr_run.go:4:23: invalid operation: 1 < 2 == 3 (mismatched types untyped bool and untyped int)
    
  • Lua blows up on 1 < 2 == 3 < 4 with the following error, parenthesized version works as expected (returning true):

    (command line):1: attempt to compare boolean with number
    
  • Nim blows up on 1 < 2 == 3 < 4 with the following error, parenthesized version works as expected (returning true):

    /work/expr_run.nim(1, 12) Error: type mismatch: got <bool, int literal(3)>
    but expected one of:
    proc `==`(x, y: bool): bool
      first type mismatch at position: 2
      required type for y: bool
      but expression '3' is of type: int literal(3)
    
  • Perl works the same on unparenthesized and parenthesized versions (as many other C-based languages), returning !!1 (its equivalent of true).

So, looks like we'll have to build that distinction in the custom per-language precedence tables.

@GreyCat
Copy link
Member Author

GreyCat commented Mar 25, 2024

While experimenting with expression, I've whipped up a tool all-expression that allows me to quickly test some ideas in many languages we support:

$  ./all-expression -h
Usage: ./all-expression [OPTIONS] EXPRESSION"

Interpretes EXPRESSION in various programming languages using Docker images
and prints the result.

Options:
  --parallel, -p  Run all targets in parallel
  --help, -h      Show this help message

$ ./all-expression '1 + 2'
* C++ (gcc11): 3
* C++ (clang11): 3
* Go: 3
* Java: 3
* JavaScript: 3
* Lua: 3
* Nim: 3
* Perl: $VAR1 = 3;
* PHP: int(3)
* Python: 3
* Ruby: 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants