ILReader gets implicit conversions subtly wrong #901

dgrunwald · 2017-10-07T19:30:03Z

Ecma-335 Partition III 1.6 Implicit argument coercion, Table 9: Signature Matching shows the possible implicit conversions possible in IL.

Issues with our current implementation:

We don't distinguish between int32->native int and int32->native unsigned int, thus potentially mixing up sign extension/zero extension in 64-bit executables.
~~We don't explicitly model starting/stopping GC tracking. This might cause assertions in the decompiler due to mismatched types.~~ Fixed in 9a829a2.
It's unclear what happens when one branch of an if pushes int32 on the stack, and the other branch pushes native int on the stack. Where exactly does the conversion happen? ~~Currently we emit a warning and an inconsistent ILAst, which then causes assertions due to mismatched types. (e.g. in the compiler-generated _onexit_m method in C++/CLI console apps)~~ We have a workaround that tends to work for C++/CLI-generated code, but isn't correct in general.

The text was updated successfully, but these errors were encountered:

dgrunwald · 2017-10-07T19:41:45Z

We should test with programs like this:

  ldarg condition
  brtrue use_native_int

  ldc.i4.m1
  br after_if
use_native_int:
  ldc.i4.m1
  conv.u
after_if:
  ldc.i4.1
  add.ovf.un
  ret

Does the addition always overflow when use_native_int==false? Or only on 32-bit systems?
What if the use_native_int block is moved after the add instruction? The spec claims that stack types can be inferred in a single pass, does this mean that the behavior of add.ovf differs depending on where the use_native_int block is placed?

dgrunwald · 2017-10-08T13:58:54Z

.method public static native int Int32OrNative(int32 val, bool use_native)
{
    ldarg.1
    brtrue use_native_int
use_i4:
    ldarg.0
    br after_if
after_if:
    ldc.i4.1
    add
    ret
use_native_int:
    ldarg.0
    conv.u
    br after_if
}
Result (64-bit):
Int32OrNative(0x7fffffff, false) = 2147483648
Int32OrNative(0x7fffffff, true) = 2147483648
Int32OrNative(-1, false) = 0
Int32OrNative(-1, true) = 4294967296

But if I comment out the conv.u, then I get:

Int32OrNative(0x7fffffff, false) = -2147483648
Int32OrNative(0x7fffffff, true) = -2147483648
Int32OrNative(-1, false) = 0
Int32OrNative(-1, true) = 0

This means the presence of the conv.u can change the bit width of an addition prior to it! This means it's impossible to assign types to instructions in a single pass (which the ILReader is currently trying to do).

Ecma-335 claims:

It shall be possible, with a single forward-pass through the CIL instruction stream for any method, to infer the exact state of the evaluation stack at every instruction (where by “state” we mean the number and type of each item on the evaluation stack).

So I guess this is just another case where the spec is just plain wrong.

dgrunwald · 2017-10-13T15:32:31Z

The code in this gist demonstrates that the .NET runtime behavior depends on whether the runtime can detect a piece of code is unreachable at byte-code parse-time. (remove the comment from the stloc/ldloc pair to change the behavior of the program when executed with 64-bit .NET framework).

With this fix we still only propagate type information along forward branches, so some of the issues in #901 remain.

dgrunwald · 2017-11-05T22:31:23Z

The fix I just pushed keeps the ILVariables for 32-bit and native-size stack slots separate even when these slots meet at a branch. Instead, we introduce a conversion instruction ("stack type adjustment") after every store to the smaller stack slot.
This fixes the Int32OrNative test case in the restricted case where type information only flows across forward branches. (reading Ecma-335 gives the impression that this restricted case is the only legal case, so hopefully there isn't any code in the wild where type information needs to flow backwards)

However, the other aspects of the original issue are still a problem.

dgrunwald · 2019-07-26T11:28:57Z

The plan here is to rewrite the ILReader to work more like the .NET bytecode importer. See ~~dotnet/coreclr#14492~~ dotnet/runtime#9130 for some of the hairy details.

In summary, we need the same "reimport" process as the .NET runtime: we need to be able to import a block multiple times as we learn more about its input stack types.
There's the I4->I->I8 upgrade path; and apparently (even though Ecma-335 claims there's only one floating-point stack type) RyuJit also has a float->double reimport trigger and a 0/byref reimport trigger.

This rewrite should also use the same "reachable code only" approach as the .NET runtime. Optionally (if "removed dead code" option is used) we could even constant-fold simple conditional branches (like .NET does), which would significantly increase our ability to understand obfuscated code.

…orts This makes our logic more similar to that used by the dotnet runtime. This lets us infer correct stack types in edge cases such as #2401. It also improves support for obfuscated control flow such as #2878.

dgrunwald added Bug Decompiler The decompiler engine itself labels Oct 7, 2017

dgrunwald added this to the 3.0 milestone Oct 7, 2017

dgrunwald mentioned this issue Oct 28, 2017

Mixed mode assembly: Incompatible stack types: I vs I4 #938

Closed

dgrunwald added a commit that referenced this issue Nov 5, 2017

Fix #938: add support for I4->I stack type adjustments.

f155ca4

With this fix we still only propagate type information along forward branches, so some of the issues in #901 remain.

dgrunwald removed this from the 3.0 milestone Nov 5, 2017

dgrunwald mentioned this issue Jul 13, 2019

ILSpy cannot decompile files that have stack layout that doesn't comply with the .Net specifications, i.e. Obfuscated files #1565

Closed

dgrunwald mentioned this issue Jul 25, 2019

same as #1430 appear again #1513

Closed

dgrunwald mentioned this issue Jul 26, 2019

Unreachable code blocks #1242

Closed

siegfriedpammer mentioned this issue Sep 25, 2019

ILSpy developer meeting in Stuttgart 2019-09-27 to 2019-09-29 #1584

Closed

dgrunwald mentioned this issue Nov 27, 2019

Extra optimizations #1825

Closed

dgrunwald mentioned this issue Jan 31, 2020

Weird inconsistency in types on IL evaluation stack dotnet/runtime#9130

Closed

dgrunwald mentioned this issue May 12, 2021

Stack to variable error #2401

Closed

siegfriedpammer mentioned this issue Oct 2, 2021

Long-term evolution of the ICSharpCode.Decompiler model and pipeline #2512

Open

7 tasks

dgrunwald mentioned this issue Dec 31, 2022

Modern versions of ILSpy do not handle aggressive branching very well. #2878

Closed

dgrunwald mentioned this issue Jun 1, 2023

Refactor ILReader: only read reachable code + support reimports #2993

Merged

siegfriedpammer closed this as completed in #2993 Jun 2, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILReader gets implicit conversions subtly wrong #901

ILReader gets implicit conversions subtly wrong #901

dgrunwald commented Oct 7, 2017 •

edited

Loading

dgrunwald commented Oct 7, 2017 •

edited

Loading

dgrunwald commented Oct 8, 2017 •

edited

Loading

dgrunwald commented Oct 13, 2017 •

edited

Loading

dgrunwald commented Nov 5, 2017

dgrunwald commented Jul 26, 2019 •

edited by siegfriedpammer

Loading

ILReader gets implicit conversions subtly wrong #901

ILReader gets implicit conversions subtly wrong #901

Comments

dgrunwald commented Oct 7, 2017 • edited Loading

dgrunwald commented Oct 7, 2017 • edited Loading

dgrunwald commented Oct 8, 2017 • edited Loading

dgrunwald commented Oct 13, 2017 • edited Loading

dgrunwald commented Nov 5, 2017

dgrunwald commented Jul 26, 2019 • edited by siegfriedpammer Loading

dgrunwald commented Oct 7, 2017 •

edited

Loading

dgrunwald commented Oct 7, 2017 •

edited

Loading

dgrunwald commented Oct 8, 2017 •

edited

Loading

dgrunwald commented Oct 13, 2017 •

edited

Loading

dgrunwald commented Jul 26, 2019 •

edited by siegfriedpammer

Loading