Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Bug with simplifying abstract binary expressions #2327

Closed
trueadm opened this issue Jul 26, 2018 · 12 comments
Closed

Bug with simplifying abstract binary expressions #2327

trueadm opened this issue Jul 26, 2018 · 12 comments

Comments

@trueadm
Copy link
Contributor

trueadm commented Jul 26, 2018

Take the given example:

function fn2(cond, obj) {
  if (cond) {
    return obj.x;
  }
  return false;
}

function fn(cond, obj1, obj2) {
  var res1 = fn2(cond, obj1);
  var res2 = fn2(cond, obj2);

  if (res1 > 0) {
    return res1.toString();
  }
  if (res2 > 0) {
    return res1.toString();
  }
  return null;
}

var cond = global.__abstract ? __abstract("boolean", "(false)") : false
var obj1 = global.__abstract
  ? __abstract({ x: global.__abstract ? __abstract("boolean", "obj1.x") : false }, "({ x: false })")
  : { x: false };

var obj2 = global.__abstract 
  ? __abstract({ x: global.__abstract ? __abstract("boolean", "obj2.x") : false }, "({ x: false  })")
  : { x: false };

var result = fn(cond, obj1, obj2);

In the output you can see that we're using _$0 even though it may not be used:

  var _2 = false;
  _$2.obj1 = {
    x: false
  };
  _$2.obj2 = {
    x: false
  };

  if (_2) {
    var _$0 = obj1.x;
  }

  if (_2) {
    var _$1 = obj2.x;
  }

  var _9 = _$0.toString();

  _$2.result = (_2 ? _$0 > 0 : false) ? _9 : (_2 ? _$1 > 0 : false) ? _9 : null;

If I change the condition from if (res1 > 0) { to if (res1) {, the issue goes away. If I disable the simplifier entirely, this also fixes the problem.

@NTillmann
Copy link
Contributor

NTillmann commented Jul 26, 2018

I think there are two separate issues here:

  1. _"In the output you can see that we're using $0 even though it may not be used". This is because of a mismatch in expectations on what evaluation non-temporal abstract value computations can do. Things would work if all such computations are total (don't throw exceptions) and side-effect free. So whoever emits .toString() should do it in a way that doesn't crash on undefined. It doesn't have to produce a meaningful value, but it cannot crash. (Alternatively, the serializer needs to ensure that all code it emits effectively implements a lazy value evaluation model.)

  2. "If I change the condition from if (res1 > 0) { to if (res1) {, the issue goes away.". Indeed, when I do that, I get the following as the last line:

  _$2.result = (_0 ? _$0 : false) ? "true" : (_0 ? _$1 > 0 : false) ? "false" : null;

Where does the "true" come from? I don't see how the original program allows for that. But that's just a separate bug.

@trueadm
Copy link
Contributor Author

trueadm commented Jul 26, 2018

@NTillmann thanks for taking the time to explain.

So I guess all these other cases are likely to fail to?

| "(A).length" | "(A).toString()" | "(A).slice(B,C)" | "(A).split(B,C)" | "global.JSON.stringify(A)" | "global.JSON.parse(A)" | "JSON.stringify(...)" | "JSON.parse(...)"

Given that the values in their args might be undefined/null, it seems they can also throw at runtime too. Note that none of these template get used in optimized functions or React components, which is why I'm concerned as they can't get used as abstract function calls will always result in temporals in pure scope, so we've likely had correctness issues for some time now but been unaware because it was being covered up by pure scope.

@hermanventer
Copy link
Contributor

The fact that you need the composition of two conditionals to reproduce this makes me think that this is related to the problem with partial joins.

@NTillmann
Copy link
Contributor

@trueadm Yes, I think so.

@NTillmann
Copy link
Contributor

Some more thoughts regarding my point 1. above:

What causes the issue is an interaction of various features:

  • While most abstract values require the evaluation of all of their arguments, conditional (?: and logical (short-circuit || and &&) abstract values don't. In fact, at best, evaluating their inactive arguments is at best doing some unnecessary computation. But at worst, it might cause exceptions and side effects (since the conditions under which they don't might not hold).
  • The "inlining" feature that moves such computations back into a conditionally evaluated scope can effectively hide such issues. This feature was originally really just meant to reduce code size by avoiding giving names to single-use expression, but if we make it required, then I think it should in fact fix the first issue. However...
  • Simplification rules (e.g. those that are meant to distribute) and CSE may cause multiple references to such values to arise nested within abstract values. Then the inlining feature cannot kick in anymore, at least in its current implementation, the computations get hoisted, and executed before the relevant conditions get checked. This is a problem.

There could be different solutions:

  • Making inlining required, and making the serializer emit different code patterns that make lazy the evaluation of non-inlined values that appear in conditional contexts. So the serializer would have to be aware of yet another kind of ordering constraints, but at least these constraints would only arise from conditional and logical operations. Or,
  • Ensure that simplification rules don't introduce multiple references, and kill CSE. (I don't think this is desirable.) Or,
  • Make sure that all the evaluation of all abstract values is total (not throwing) and side-effect tree for all possible values / under all conditions. To achieve that, we'd probably make some operations temporal, especially those where side effects are a potential issue; exceptions could probably be largely handled by some additional checks in the code generated by build nodes; which we could then eliminate if the path condition implies them; this could be queried from within the build-node; we already do similar things to elide implied checks for conditional abstract values that are only used under certain conditions. Then we could just keep doing the unnecessary computations eagerly (when it saves code size). Initially, I thought this would be a reasonable option, but I am not sure anymore.

@trueadm
Copy link
Contributor Author

trueadm commented Jul 27, 2018

I dug into this more today too. We use a temporal abstract values for toString, split, toLocaleString, slice, length and a bunch of others. They all have side-effects and not safe to be a-temporal without changing how the hashing works (like Herman did for ToObject templates), I managed to make repros for most of them locally. For example, here is split: https://prepack.io/repl.html#G4QwTgBCELwQLgCwJYGcB0B9TIBGr4wQBjeCAfgmzwKNIAoAiW5AOwHNGAaCR+gcgBmAe2H8AlI3EQAXLxHDGAbgBQoSLlhR0qAA4AbZPHrjV6iMS0gdBoydUrkgiPU0x3F6QG8VEC8NZUYX0AU3R9YXYmAGVEYQBXfQATCFYQ4BDIRBBdXRDWKRUAXyA

In my opinion, they should be temporal abstract values or have unique hashes (which means that CSE won't work anyway), with the exception of maybe Number.prototype.toString, String.prototype.toString and Boolean.prototype.toString - for these we can use the shorthand + "" syntax rather than the prototype method call, which makes it side-effect free. We just need to be careful with integral values because + might bring about unwarranted bugs here.

Furthmore, on our internal bundle the source of the issues, and the reason why #2321 can't land in its current state is not because of CSE only. It's because the logic in the simplication phase, where we check if two abstract values are equal via equals and then simplify conditions. This makes sense, but clearly if you are doing this to two side-effectful operations like split, then this is going to break as per my example in this reply.

@hermanventer
Copy link
Contributor

Looking at this again, I think the key issue is that a (still to be computed) value x that depends on a (still to be computed) value y should always ensure that y is computed in a path that dominates the location of x.

Kind of compilers 101. Bugs will happen. Let's fix this.

@NTillmann
Copy link
Contributor

Looking at this just from the serializer's point of view, here's simple example that shows the issue:

(function () {
  let c = __abstract("boolean", "(c)");
  let d = __abstract("boolean", "(d)");
  let o = {};
  global.result = c ? o : d ? o : null;
})();

Prepacking this yields

(function () {
  var _$0 = this;
  var _1 = {};
  _$0.result = c ? _1 : d ? _1 : null;
}).call(this);

Nothing wrong with that, but note that the object behind _1 is created even if c and d are false. Not wrong, but inefficient. From the serializer's point of view, it's the same issue as hoisting value computations out of conditional contexts.

@NTillmann
Copy link
Contributor

Here's another fun example. The simplifier enables interpreting .toString in a path sensitive way, but this is fact is lost when reaching the visitor, and CSE identifies the two occurrences of n.toString, resulting in a single value with two references, which the serializer hoists and makes unconditional.

This is wrong.

(function () {
  let n = __abstractOrNull("number", "(n)");
  let m = __abstractOrNull("number", "(m)");
  let c = __abstractOrNull("boolean", "(c)");
  let d = __abstractOrNull("boolean", "(d)");
  let s;
  if (c) {
    if (d) { if (n != undefined) s = "A" + n.toString(); }
    else { if (n != undefined) s = "B" + n.toString(); }
  }
  global.s = s;
})();

Talking with @hermanventer, the way out is to make all a-temporal abstract values into temporal abstract values when they get simplified in a path-sensitive manner.

@NTillmann NTillmann assigned hermanventer and unassigned NTillmann Jul 31, 2018
@sb98052 sb98052 self-assigned this Aug 6, 2018
@sb98052
Copy link

sb98052 commented Aug 7, 2018

A few things are happening here. First, the code with which these problems were reported is old. @trueadm's type-based refinement changes, i.e. 1986d76 make the problem go away, for .toString(). This is because with this change, when the type of x is known, x.toString() turns into '' + x. Before the change, a runtime call was being generated. I suspect that the root of the problem was the a-temporal nature of this runtime call.

Second, abstract-concrete unions get simplified based on the path condition. This simplification is correct only if subsequent operations on the union yield a-temporal results. The interesting case here is when path conditions simplify a temporal value into being a-temporal..toString() is an example of this. x.toString() as a runtime call is temporal, but simplified into x + '' is a-temporal and safe.

There seem to be three approaches to dealing with this problem in general:

  1. Ensure that all such runtime calls that are temporal.
  2. Derive AbstractValues when simplifying them in a path-specific way
  3. Make AbstractValues that may be simplified in a path-specific way temporal to begin with (e.g. the AbstractValue in abstract-concrete unions)

If we do (2) or (3) the .tostring() example above would stop working, even though the path implies that expressions with x.toString() can be lifted.

@NTillmann
Copy link
Contributor

Ensure that all runtime calls are temporal.

Including implicit runtime calls, e.g. can "a" + x call x.toString()?

@sb98052
Copy link

sb98052 commented Aug 9, 2018

I think what this boils down to is: any expression (not only call) that can throw or generate side effects conditionally should be temporal. Is there a way that implicit toStrings can be manipulated to do that?

Even with toString runtime calls gone, there are still cases that lead to this problem. I'm going to create separate issues for these.

An interesting one is the in binary operator, which throws if the right hand side does not have type object, because it is not created via AbstractValue.createFromTemplate. My attempt to trigger the problem however led to a different bug (#2406).

Probably, more auditing will be needed to temporalize all such cases.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants