Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILReader gets implicit conversions subtly wrong #901

Closed
dgrunwald opened this issue Oct 7, 2017 · 5 comments · Fixed by #2993
Closed

ILReader gets implicit conversions subtly wrong #901

dgrunwald opened this issue Oct 7, 2017 · 5 comments · Fixed by #2993
Labels
Bug Decompiler The decompiler engine itself

Comments

@dgrunwald
Copy link
Member

dgrunwald commented Oct 7, 2017

Ecma-335 Partition III 1.6 Implicit argument coercion, Table 9: Signature Matching shows the possible implicit conversions possible in IL.

Issues with our current implementation:

  • We don't distinguish between int32->native int and int32->native unsigned int, thus potentially mixing up sign extension/zero extension in 64-bit executables.
  • We don't explicitly model starting/stopping GC tracking. This might cause assertions in the decompiler due to mismatched types. Fixed in 9a829a2.
  • It's unclear what happens when one branch of an if pushes int32 on the stack, and the other branch pushes native int on the stack. Where exactly does the conversion happen? Currently we emit a warning and an inconsistent ILAst, which then causes assertions due to mismatched types. (e.g. in the compiler-generated _onexit_m method in C++/CLI console apps) We have a workaround that tends to work for C++/CLI-generated code, but isn't correct in general.
@dgrunwald dgrunwald added Bug Decompiler The decompiler engine itself labels Oct 7, 2017
@dgrunwald dgrunwald added this to the 3.0 milestone Oct 7, 2017
@dgrunwald
Copy link
Member Author

dgrunwald commented Oct 7, 2017

We should test with programs like this:

  ldarg condition
  brtrue use_native_int

  ldc.i4.m1
  br after_if
use_native_int:
  ldc.i4.m1
  conv.u
after_if:
  ldc.i4.1
  add.ovf.un
  ret

Does the addition always overflow when use_native_int==false? Or only on 32-bit systems?
What if the use_native_int block is moved after the add instruction? The spec claims that stack types can be inferred in a single pass, does this mean that the behavior of add.ovf differs depending on where the use_native_int block is placed?

@dgrunwald
Copy link
Member Author

dgrunwald commented Oct 8, 2017

.method public static native int Int32OrNative(int32 val, bool use_native)
{
    ldarg.1
    brtrue use_native_int
use_i4:
    ldarg.0
    br after_if
after_if:
    ldc.i4.1
    add
    ret
use_native_int:
    ldarg.0
    conv.u
    br after_if
}
Result (64-bit):
Int32OrNative(0x7fffffff, false) = 2147483648
Int32OrNative(0x7fffffff, true) = 2147483648
Int32OrNative(-1, false) = 0
Int32OrNative(-1, true) = 4294967296

But if I comment out the conv.u, then I get:

Int32OrNative(0x7fffffff, false) = -2147483648
Int32OrNative(0x7fffffff, true) = -2147483648
Int32OrNative(-1, false) = 0
Int32OrNative(-1, true) = 0

This means the presence of the conv.u can change the bit width of an addition prior to it! This means it's impossible to assign types to instructions in a single pass (which the ILReader is currently trying to do).

Ecma-335 claims:

It shall be possible, with a single forward-pass through the CIL instruction stream for any method, to infer the exact state of the evaluation stack at every instruction (where by “state” we mean the number and type of each item on the evaluation stack).

So I guess this is just another case where the spec is just plain wrong.

@dgrunwald
Copy link
Member Author

dgrunwald commented Oct 13, 2017

The code in this gist demonstrates that the .NET runtime behavior depends on whether the runtime can detect a piece of code is unreachable at byte-code parse-time. (remove the comment from the stloc/ldloc pair to change the behavior of the program when executed with 64-bit .NET framework).

dgrunwald added a commit that referenced this issue Nov 5, 2017
With this fix we still only propagate type information along forward branches, so some of the issues in #901 remain.
@dgrunwald
Copy link
Member Author

The fix I just pushed keeps the ILVariables for 32-bit and native-size stack slots separate even when these slots meet at a branch. Instead, we introduce a conversion instruction ("stack type adjustment") after every store to the smaller stack slot.
This fixes the Int32OrNative test case in the restricted case where type information only flows across forward branches. (reading Ecma-335 gives the impression that this restricted case is the only legal case, so hopefully there isn't any code in the wild where type information needs to flow backwards)

However, the other aspects of the original issue are still a problem.

@dgrunwald
Copy link
Member Author

dgrunwald commented Jul 26, 2019

The plan here is to rewrite the ILReader to work more like the .NET bytecode importer. See dotnet/coreclr#14492 dotnet/runtime#9130 for some of the hairy details.

In summary, we need the same "reimport" process as the .NET runtime: we need to be able to import a block multiple times as we learn more about its input stack types.
There's the I4->I->I8 upgrade path; and apparently (even though Ecma-335 claims there's only one floating-point stack type) RyuJit also has a float->double reimport trigger and a 0/byref reimport trigger.

This rewrite should also use the same "reachable code only" approach as the .NET runtime. Optionally (if "removed dead code" option is used) we could even constant-fold simple conditional branches (like .NET does), which would significantly increase our ability to understand obfuscated code.

dgrunwald added a commit that referenced this issue Jun 1, 2023
…orts

This makes our logic more similar to that used by the dotnet runtime. This lets us infer correct stack types in edge cases such as #2401. It also improves support for obfuscated control flow such as #2878.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Decompiler The decompiler engine itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant