Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-register / multi-memory location support for datatypes #7265

Open
Wall-AF opened this issue Dec 7, 2024 · 4 comments
Open

Multi-register / multi-memory location support for datatypes #7265

Wall-AF opened this issue Dec 7, 2024 · 4 comments
Assignees
Labels
Feature: Decompiler Status: Triage Information is being gathered

Comments

@Wall-AF
Copy link

Wall-AF commented Dec 7, 2024

Is your feature request related to a problem? Please describe.
I have lots of calculations that use numeric values (or calc'd addresses) that use more than one adjacent memory locations or registers to represent them. However the current decompiler doesn't merge them, making the resulting ouput more difficult to interpret.

In the example assembly below (from an x86 16-bit application suit) the section of code from 1010:1c23to1010:1c33 represents a simple if (nMaxVal <= local_10) break;

                             LAB_1010_1c1e                                   XREF[2]:     1010:1c2c(j), 1010:1c33(j)  
   1010:1c1e 47           INC        DI
   1010:1c1f 66 d1 7e fa  SAR        dword ptr SS:[BP + nMaxVal],0x1

                             LAB_1010_1c23                                   XREF[2]:     1010:1c1c(j), 1010:1c15(j)  
   1010:1c23 8b 56 fc     MOV        DX,word ptr SS:[BP + nMaxVal+0x2]
   1010:1c26 8b 46 fa     MOV        AX,word ptr SS:[BP + nMaxVal]
   1010:1c29 3b 56 f4     CMP        DX,word ptr SS:[BP + local_10+0x2]
   1010:1c2c 7f f0        JG         LAB_1010_1c1e
   1010:1c2e 75 05        JNZ        LAB_1010_1c35
   1010:1c30 3b 46 f2     CMP        AX,word ptr SS:[BP + local_10]
   1010:1c33 77 e9        JA         LAB_1010_1c1e

                             LAB_1010_1c35                                   XREF[3]:     1010:1bf9(j), 1010:1c01(j), 
                                                                                          1010:1c2e(j)  
   1010:1c35 89 7e f0     MOV        word ptr SS:[BP + local_12],DI
   1010:1c38 83 7e f0 00  CMP        word ptr SS:[BP + local_12],0x0
   1010:1c3c 7f 05        JG         LAB_1010_1c43
   1010:1c3e 90           NOP
   1010:1c3f 90           NOP
   1010:1c40 eb 6d        JMP        LAB_1010_1caf
   1010:1c42 90           NOP
                             LAB_1010_1c43                                   XREF[1]:     1010:1c3c(j)  
   1010:1c43 83 7e fc 00  CMP        word ptr SS:[BP + nMaxVal+0x2],0x0

as opposed to this (from the decompiler)
if ((nMaxVal._2_2_ <= lVal) && ((nMaxVal._2_2_ != lVal || ((uint)nMaxVal <= uVar3 * 2 - 1)))) break;
where even the committed name lVal hasn't propagated to the assembly (and equates to local_10) and uVar3 which is defined way back in the function and seems to represent the loword of local_10!

Describe the solution you'd like
A mechanism to inform Ghidra that a pair (or more) registers/adjacent memory locations represent a single datatype AND a mechanim to give Ghirda the know-how to interpret calculations using them appopriately, like when adding using something like

ADD AX,{someval}
ADC DX

Describe alternatives you've considered
Pulling one's hair out!

Additional context
#6090, #5900, #5806, maybe #5720, maybe #5318, maybe #5066. There could be more, but ...

@DualTachyon
Copy link

The multi-register one at least already exists but it is heavily broken. In some Cortex M0 (= 32bit only) firmware, there are some usage of u64 integers. It works some times, but most of the time it gives the output you already described.

image

@DualTachyon
Copy link

You can see above the circled block another use of u64_rotate_left, but this time with the usual CONCAT44(...) to build the type.

@Wall-AF
Copy link
Author

Wall-AF commented Dec 8, 2024

You can see above the circled block another use of u64_rotate_left, but this time with the usual CONCAT44(...) to build the type.

It seems that in this case the param_3 is the wrong type, just try changing it manually and see.

@DualTachyon
Copy link

I was just showing an example (in red). Even with the right type, it produces a concat by splitting the u64 into 2x 32, the problem remains. The u64_rotate_left function itself fails completely to use u64, but somehow returns u64 correctly.

The handling is also different depending on CPU architecture. On MIPS with 32bit registers, i get 100% failure in making u64s work. ARM is one of the few (or maybe the only) case where it sometimes works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
Development

No branches or pull requests

4 participants