Optimal operator priority pretty-printing in translators #69

GreyCat · 2016-12-25T03:04:40Z

Right now, generation of target code by translators yield more or less spontaneous parenthesis generation. In some cases, it proves to be unoptimal (i.e. extra parenthesis are added when there's no need for them) and sometimes it yields wrong results, i.e.:

A few examples that should probably provide more optimal translation:

1 + 2
- now yields (1 + 2)
- should be 1 + 2
1 + 2 + 5
- now yields ((1 + 2) + 5)
- should be 1 + 2 + 5
(1 + 2) / (7 * 8)
- now yields ((1 + 2) / (7 * 8))
- should be (1 + 2) / (7 * 8)

I propose to:

Define formally the order of operators used in our expression language
Fix translators in some way that would produce optimal translation for every target language; this is intrivial, as every language has somewhat different idea of operator priorities and availability.

Hopefully, it would also provide an ultimate answer to all these bugs above as well.

The text was updated successfully, but these errors were encountered:

GreyCat · 2016-12-30T09:20:31Z

It turns out that things are not as easy as they look like at the first glance. I've tried a naïve approach, introducing a "priority" integer that gets compared to that's inside. Literals get priority of 0, exponentiation gets 20, multiplication gets 40, addition gets 50, etc. Various translators for various target languages may output different values, as some languages have slightly different ideas on operator priorities. A very simple example ~5**2 gets evaluated differently:

in Ruby: (~5)**2 = 36
in Python: ~(5**2) = -25

However, things get ugly fast, as there's also operator's associativity. For example:

1 - 2 - 3 gets parsed as ((1 - 2) - 3)
(1 - 2) - 3 gets parsed as ((1 - 2) - 3)
1 - (2 - 3) gets parsed as (1 - (2 - 3))

Current priority-only based algorithm would translate all these examples to 1 - 2 - 3, as they're all of equal priority, and thus no parenthesis is needed. This is true for associative operations (i.e. (1 + 2) + 3 = 1 + (2 + 3), so it's somewhat safe to translate that to 1 + 2 + 3), but not true for operators like -, /, etc.

Moreover, I'm not totally sure that it's a good idea to omit parenthesis in 1 + (2 + 3) translation as well. It may be correct from arithmetic POV, but I'm not sure if it's a good idea to take so much liberty with original expression author's intent.

koczkatamas · 2016-12-30T13:30:01Z

I would not omit parentheses as they can make the generated code more understandable when the author uses them to separate logically different parts.

GreyCat · 2016-12-30T17:29:57Z

I've started a comparison table on priorities in different languages:

https://docs.google.com/spreadsheets/d/13Cs3SidXHJVDCqydHCBwnk3uQ2FjGmHsjZa6SHkrE34/edit?usp=sharing

GreyCat · 2016-12-30T17:34:00Z

I would not omit parentheses as they can make the generated code more understandable when the author uses them to separate logically different parts.

Unfortunately, it's not completely possible. From the AST point of view:

(1 + 2) + 3
1 + 2 + 3
((((1 + 2)) + 3))
etc

are translated to exactly the same

BinOp(
  BinOp(
    IntNum(1),
    Add,
    IntNum(2)
  ),
  Add,
  IntNum(3)
)

After we've got AST, it's not possible to distinguish them.

koczkatamas · 2016-12-30T17:57:08Z

They could be

(1 + 2) + 3

BinOp(
  Paren(
    BinOp(
      IntNum(1),
      Add,
      IntNum(2)
    )
  ),
  Add,
  IntNum(3)
)

1 + 2 + 3

BinOp(
  BinOp(
    IntNum(1),
    Add,
    IntNum(2)
  ),
  Add,
  IntNum(3)
)

((((1 + 2)) + 3))

Paren(
  Paren(
    BinOp(
      Paren(
        Paren(
          BinOp(
            IntNum(1),
            Add,
            IntNum(2)
          )
        )
      )
      Add,
      IntNum(3)
    )
  )
)

But I don't know whether this is the proper way to do it, or is it worth it at first place...

GreyCat · 2016-12-30T19:06:08Z

Well, it's generally not how AST parsing works. More or less, the general idea is to eliminate parenthesis and all these operator application priority steps, and that's actually what we need, as we can't rely on particular operator priorities (and parenthesis application) of a single language, as they can be different for different targets...

koczkatamas · 2016-12-30T19:30:28Z

Yes, but we can interpret parentheses in ksy expressions as "force parenthesis here" operators. I cannot think of any case where these "forced" parentheses would change the meaning of the expression in any of the target languages.

Of course sometimes we need to put additional parentheses into the target language which won't be in the AST where the target language's operator precedence does not match with the .ksy expression precedence.

So the AST would still describe the same expression, but would add support for additional used-only-for-clarity-parentheses.

Or this could be even an optional compiler option: keep original parentheses from ksy or not.

…ions (as discussed in kaitai-io/kaitai_struct#69): * `translate()` family recursive functions now pass along not a single string, but string + priority integer * There is a `translate(v, outerPriority)` wrapper that analyzes inner priority vs outer priority and adds parenthesis * Fixed many `translate(...)` invocations to adhere to new pattern * Ideally, `translate()` should be always called aware of the context the expression will be used as * Introduced `CTernaryOperator` that carries the same logic of C-like ternary operator common to many languages * Modified some TranslatorSpec tests Work in progress, still has problems with associative operators.

GreyCat · 2017-01-02T06:26:06Z

I've pushed what I have now into distinct branch optimal_expressions, so anyone can track my progress.

GreyCat · 2024-03-25T12:29:16Z

kaitai-io/kaitai_struct_compiler#277 was merged, so I would continue using this issue to track remaining problems, this is roughly the plan:

comparison operators and boolean operators
ternary operator
integer operations with extreme integer constants
unary operators

I've started playing with comparisons, and immediately figured out that it's not so simple even with very basic set of comparison operators. Some languages have all of comparison operators on same level of precedence (e.g. Python), some languages have == and != as higher precedence (e.g. C++, Java, and many others).

A simple test: 1 < 2 == 3 < 4.

C++ (gcc11): 1
C++ (clang14): 1
Java: true
JavaScript: true
PHP: bool(true)
Python: False
Ruby: true

An also interesting bit is that modern gcc issues a warning on this:

expr_run.cpp: In function 'int main()':
expr_run.cpp:3:25: warning: suggest parentheses around comparison in operand of '==' [-Wparentheses]
    3 |         std::cout << (1 < 2 == 3 < 4) << std::endl;
      |                       ~~^~~

So, it looks like at least for Python (and likely for C++ too to avoid the warning) we need to modify generation to parenthesize. (1 < 2) == (3 < 4) yields correct "True" result in Python.

GreyCat · 2024-03-25T14:49:43Z

A few other data points:

Go blows up building 1 < 2 == 3 < 4 with the following error, parenthesized version works as expected (returning true):
```
./expr_run.go:4:23: invalid operation: 1 < 2 == 3 (mismatched types untyped bool and untyped int)
```
Lua blows up on 1 < 2 == 3 < 4 with the following error, parenthesized version works as expected (returning true):
```
(command line):1: attempt to compare boolean with number
```

Nim blows up on 1 < 2 == 3 < 4 with the following error, parenthesized version works as expected (returning true):

/work/expr_run.nim(1, 12) Error: type mismatch: got <bool, int literal(3)>
but expected one of:
proc `==`(x, y: bool): bool
  first type mismatch at position: 2
  required type for y: bool
  but expression '3' is of type: int literal(3)

Perl works the same on unparenthesized and parenthesized versions (as many other C-based languages), returning !!1 (its equivalent of true).

So, looks like we'll have to build that distinction in the custom per-language precedence tables.

GreyCat · 2024-03-25T20:30:45Z

While experimenting with expression, I've whipped up a tool all-expression that allows me to quickly test some ideas in many languages we support:

$  ./all-expression -h
Usage: ./all-expression [OPTIONS] EXPRESSION"

Interpretes EXPRESSION in various programming languages using Docker images
and prints the result.

Options:
  --parallel, -p  Run all targets in parallel
  --help, -h      Show this help message

$ ./all-expression '1 + 2'
* C++ (gcc11): 3
* C++ (clang11): 3
* Go: 3
* Java: 3
* JavaScript: 3
* Lua: 3
* Nim: 3
* Perl: $VAR1 = 3;
* PHP: int(3)
* Python: 3
* Ruby: 3

GreyCat self-assigned this Dec 25, 2016

GreyCat added the enhancement label Jan 2, 2017

This was referenced Jan 4, 2017

#39: Invalid "switch"-statement generated for Java. kaitai-io/kaitai_struct_compiler#51

Merged

Fix ternary operator translation (add parenthesis around) kaitai-io/kaitai_struct_compiler#53

Merged

This was referenced Aug 23, 2017

Missing parenthesis around ternary operation in Javascript #1168

Closed

Parentheses are lost from expressions #1169

Closed

GreyCat mentioned this issue Jan 16, 2018

Offering assisstance #308

Closed

generalmimon mentioned this issue Jan 30, 2020

Test calling methods on parentheses with binary operators kaitai-io/kaitai_struct_tests#74

Merged

generalmimon mentioned this issue Nov 19, 2020

Create new AST node Group in the expression language, that represents value in parenthesis kaitai-io/kaitai_struct_compiler#214

Open

generalmimon added this to the Low priority milestone Jun 15, 2022

This was referenced Mar 8, 2024

Fix TranslatorSpec tests kaitai-io/kaitai_struct_compiler#273

Merged

KS expression language: optimize parenthesis generation kaitai-io/kaitai_struct_compiler#277

Merged

GreyCat mentioned this issue Mar 26, 2024

Implement optimal parenthesis for ternary operator + add tests kaitai-io/kaitai_struct_compiler#289

Open

generalmimon mentioned this issue Feb 9, 2025

Fix operator precedence handling across target languages kaitai-io/kaitai_struct_compiler#315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal operator priority pretty-printing in translators #69

Optimal operator priority pretty-printing in translators #69

GreyCat commented Dec 25, 2016

GreyCat commented Dec 30, 2016

koczkatamas commented Dec 30, 2016

GreyCat commented Dec 30, 2016

GreyCat commented Dec 30, 2016 •

edited by generalmimon

Loading

koczkatamas commented Dec 30, 2016 •

edited by generalmimon

Loading

GreyCat commented Dec 30, 2016

koczkatamas commented Dec 30, 2016

GreyCat commented Jan 2, 2017

GreyCat commented Mar 25, 2024

GreyCat commented Mar 25, 2024

GreyCat commented Mar 25, 2024

Optimal operator priority pretty-printing in translators #69

Optimal operator priority pretty-printing in translators #69

Comments

GreyCat commented Dec 25, 2016

GreyCat commented Dec 30, 2016

koczkatamas commented Dec 30, 2016

GreyCat commented Dec 30, 2016

GreyCat commented Dec 30, 2016 • edited by generalmimon Loading

koczkatamas commented Dec 30, 2016 • edited by generalmimon Loading

GreyCat commented Dec 30, 2016

koczkatamas commented Dec 30, 2016

GreyCat commented Jan 2, 2017

GreyCat commented Mar 25, 2024

GreyCat commented Mar 25, 2024

GreyCat commented Mar 25, 2024

GreyCat commented Dec 30, 2016 •

edited by generalmimon

Loading

koczkatamas commented Dec 30, 2016 •

edited by generalmimon

Loading