Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift/abstract implementations like C++ std::string #5504

Open
0xdevalias opened this issue Jun 5, 2024 · 2 comments
Open

Lift/abstract implementations like C++ std::string #5504

0xdevalias opened this issue Jun 5, 2024 · 2 comments
Labels
Component: Core Issue needs changes to the core Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Type: Enhancement Issue is a small enhancement to existing functionality

Comments

@0xdevalias
Copy link

0xdevalias commented Jun 5, 2024

What is the feature you'd like to have?

It would be cool if Binary Ninja was able to detect and abstract implementations like C++'s std::string and similar; which could leave the actual decompiled functions that make use of it much simpler looking.

Is your feature request related to a problem?

This afternoon I spent a while going through a bunch of size checks + memory allocations to try and figure out what was going on; and at the end of it I learned that most of the complexity seems to just be the internals of std::string's workings.

Are any alternative solutions acceptable?

Unsure.

Additional Information:

Example 'raw' Function that seems to be using std::string
/* 005821b0 */  int64_t FooGUI::PrepareDialogFromDSP(struct FooGUI* this, int32_t arg2)
/* 005821b0 */  {
/* 005821da */      if ((this->field_1028 != 0 && (*(uint8_t*)(this->field_1dc0 + 0x60) & 8) == 0))
/* 005821da */      {
/* 005821dc */          struct FooGUI_unk_field_8* field_8_1 = this->field_8;
/* 00582202 */          /* tailcall */
/* 00582202 */          return FooGUI::DoDialog(this, field_8_1->field_3e228, field_8_1->field_3e230, ((uint64_t)((int32_t)arg2)));
/* 005821da */      }
/* 0058220b */      int64_t field_3e228 = this->field_8->field_3e228;
/* 00582215 */      int64_t field_3e228_len = _strlen(field_3e228);
/* 0058221e */      if (field_3e228_len < -0x10)
/* 0058221e */      {
/* 00582239 */          int64_t field_3e228_or_3e230_len_aligned_16byte_plus1;
/* 00582239 */          void* field_3e228_or_3e230_len_aligned_16byte_new;
/* 00582239 */          void* field_3e228_len_aligned_16byte_new_1;
/* 00582239 */          if (field_3e228_len >= 23)
/* 00582239 */          {
/* 00582254 */              // This line of decompiled code is performing an operation that aligns a length value to a 16-byte boundary. Here's a detailed breakdown:
/* 00582254 */              // 
/* 00582254 */              // 1. `field_3e228_len` is a variable holding some length value.
/* 00582254 */              // 2. `0x10` is added to `field_3e228_len`. This ensures that even if `field_3e228_len` is already a multiple of 16, it will be properly aligned.
/* 00582254 */              // 3. The result is then bitwise ANDed with `0xfffffffffffffff0`.
/* 00582254 */              // 
/* 00582254 */              // The bitwise AND operation with `0xfffffffffffffff0` (which is a mask with all bits set to 1 except the last 4 bits set to 0) effectively clears the last 4 bits of the sum. This means it rounds the sum down to the nearest multiple of 16.
/* 00582254 */              // 
/* 00582254 */              // So, in essence, this line of code is aligning the `field_3e228_len` to the nearest multiple of 16 bytes, which is a common operation for memory alignment purposes in low-level programming.
/* 00582254 */              // 
/* 00582254 */              // To summarize:
/* 00582254 */              // - `field_3e228_len` is the initial length.
/* 00582254 */              // - `+ 0x10` ensures we move to the next 16-byte boundary if we're already on one.
/* 00582254 */              // - `& 0xfffffffffffffff0` aligns the result to the nearest 16-byte boundary by clearing the last 4 bits.
/* 00582254 */              // 
/* 00582254 */              // This results in `rbx_3` holding the value of `field_3e228_len` aligned to the next multiple of 16 bytes.
/* 00582254 */              uint64_t field_3e228_len_aligned_16byte = ((field_3e228_len + 16) & 0xfffffffffffffff0);
/* 0058225b */              void* field_3e228_len_aligned_16byte_new = operator new(field_3e228_len_aligned_16byte);
/* 00582260 */              field_3e228_len_aligned_16byte_new_1 = field_3e228_len_aligned_16byte_new;
/* 00582263 */              field_3e228_or_3e230_len_aligned_16byte_new = field_3e228_len_aligned_16byte_new;
/* 0058226b */              // The bitwise OR operation will set the least significant bit (LSB) of `field_3e228_len_aligned_16byte` to `1`, regardless of its current value. Here's what happens:
/* 0058226b */              // 
/* 0058226b */              // - If the LSB of `field_3e228_len_aligned_16byte` is `0`, this operation will change it to `1`.
/* 0058226b */              // - If the LSB is already `1`, it remains `1`.
/* 0058226b */              // 
/* 0058226b */              // Effectively, this operation ensures that the resulting value in `var_48` is always odd, because the LSB determines whether a number is even or odd. By setting the LSB to `1`, the number becomes odd.
/* 0058226b */              // 
/* 0058226b */              // For any even number, the operation `var_48 = field_3e228_len_aligned_16byte | 1` would effectively end up incrementing the number by 1. This is because the bitwise OR operation with `1` sets the least significant bit (LSB) to `1`, which turns an even number (where the LSB is `0`) into an odd number (where the LSB is `1`).
/* 0058226b */              // 
/* 0058226b */              // ### Example Scenarios
/* 0058226b */              // 
/* 0058226b */              // 1. **Even number:**
/* 0058226b */              //    - `field_3e228_len_aligned_16byte` = `32` (binary: `100000`)
/* 0058226b */              //      - `32 | 1` = `33` (binary: `100001`)
/* 0058226b */              //    - The result is `33`, which is `32 + 1`.
/* 0058226b */              // 
/* 0058226b */              // 2. **Odd number:**
/* 0058226b */              //    - `field_3e228_len_aligned_16byte` = `33` (binary: `100001`)
/* 0058226b */              //      - `33 | 1` = `33` (binary: `100001`)
/* 0058226b */              //    - The result remains `33`.
/* 0058226b */              // 
/* 0058226b */              // Thus, for any even number, this operation will increment the value by 1, ensuring the resulting value is odd. For any odd number, the value remains the same.
/* 0058226b */              // 
/* 0058226b */              // --
/* 0058226b */              // 
/* 0058226b */              // Since `field_3e228_len_aligned_16byte` is always even due to the 16-byte alignment, the bitwise OR operation with `1` will indeed increment the value by 1, making it odd.
/* 0058226b */              field_3e228_or_3e230_len_aligned_16byte_plus1 = (field_3e228_len_aligned_16byte | 1);
/* 0058226f */              int64_t field_3e228_len_1 = field_3e228_len;
/* 00582239 */          }
/* 00582239 */          else
/* 00582239 */          {
/* 0058223f */              field_3e228_or_3e230_len_aligned_16byte_plus1 = ((int8_t)(field_3e228_len * 2));
/* 00582242 */              field_3e228_len_aligned_16byte_new_1 = &*(uint64_t*)((char*)field_3e228_or_3e230_len_aligned_16byte_plus1)[1];
/* 00582239 */          }
/* 00582249 */          if ((field_3e228_len >= 23 || (field_3e228_len < 23 && field_3e228_len != 0)))
/* 00582249 */          {
/* 0058227c */              _memcpy(field_3e228_len_aligned_16byte_new_1, field_3e228, field_3e228_len);
/* 00582249 */          }
/* 00582281 */          *(uint8_t*)((char*)field_3e228_len_aligned_16byte_new_1 + field_3e228_len) = 0;
/* 0058228e */          std::__1::vector<std::__...allocator<char> > > >::push_back.6444(&*(int64_t*)((char*)this + 0x3018), &field_3e228_or_3e230_len_aligned_16byte_plus1);
/* 00582297 */          if ((field_3e228_or_3e230_len_aligned_16byte_plus1 & 1) != 0)
/* 00582297 */          {
/* 0058229d */              operator delete(field_3e228_or_3e230_len_aligned_16byte_new);
/* 00582297 */          }
/* 005822a6 */          int64_t field_3e230 = this->field_8->field_3e230;
/* 005822b0 */          int64_t field_3e230_len = _strlen(field_3e230);
/* 005822b9 */          if (field_3e230_len < -0x10)
/* 005822b9 */          {
/* 005822cd */              void* field_3e230_len_aligned_16byte_new_1;
/* 005822cd */              if (field_3e230_len >= 23)
/* 005822cd */              {
/* 005822ec */                  // 
/* 005822ec */                  // Align to 16 byte boundary
/* 005822ec */                  uint64_t field_3e230_len_aligned_16byte = ((field_3e230_len + 16) & 0xfffffffffffffff0);
/* 005822f3 */                  void* field_3e230_len_aligned_16byte_new = operator new(field_3e230_len_aligned_16byte);
/* 005822f8 */                  field_3e230_len_aligned_16byte_new_1 = field_3e230_len_aligned_16byte_new;
/* 005822fb */                  field_3e228_or_3e230_len_aligned_16byte_new = field_3e230_len_aligned_16byte_new;
/* 00582303 */                  // 
/* 00582303 */                  // Since `field_3e230_len_aligned_16byte` is always even due to the 16-byte alignment, the bitwise OR operation with `1` will indeed increment the value by 1, making it odd.
/* 00582303 */                  field_3e228_or_3e230_len_aligned_16byte_plus1 = (field_3e230_len_aligned_16byte | 1);
/* 0058230b */                  int64_t field_3e230_len_1 = field_3e230_len;
/* 005822cd */              }
/* 005822cd */              else
/* 005822cd */              {
/* 005822d3 */                  field_3e228_or_3e230_len_aligned_16byte_plus1 = ((int8_t)(field_3e230_len * 2));
/* 005822d6 */                  field_3e230_len_aligned_16byte_new_1 = &*(uint64_t*)((char*)field_3e228_or_3e230_len_aligned_16byte_plus1)[1];
/* 005822cd */              }
/* 005822dd */              if ((field_3e230_len >= 23 || (field_3e230_len < 23 && field_3e230_len != 0)))
/* 005822dd */              {
/* 00582318 */                  _memcpy(field_3e230_len_aligned_16byte_new_1, field_3e230, field_3e230_len);
/* 005822dd */              }
/* 0058231d */              *(uint8_t*)((char*)field_3e230_len_aligned_16byte_new_1 + field_3e230_len) = 0;
/* 00582329 */              std::__1::vector<std::__...allocator<char> > > >::push_back.6444(&this->unk_vector, &field_3e228_or_3e230_len_aligned_16byte_plus1);
/* 00582332 */              if ((field_3e228_or_3e230_len_aligned_16byte_plus1 & 1) != 0)
/* 00582332 */              {
/* 00582338 */                  operator delete(field_3e228_or_3e230_len_aligned_16byte_new);
/* 00582332 */              }
/* 0058233d */              int32_t* maybe_str_pointer_for_dialog = this->maybe_str_pointer_for_dialog;
/* 00582344 */              int64_t field_3010 = this->field_3010;
/* 0058234e */              int32_t arg2_1;
/* 0058234e */              if (maybe_str_pointer_for_dialog != field_3010)
/* 0058234e */              {
/* 00582350 */                  arg2_1 = arg2;
/* 00582353 */                  *(uint32_t*)maybe_str_pointer_for_dialog = arg2_1;
/* 0058235b */                  // 
/* 0058235b */                  // this->maybe_str_pointer_for_dialog is updated to point to the next element in an array of int32_t. This effectively increments the pointer by one int32_t element size (4 bytes)
/* 0058235b */                  this->maybe_str_pointer_for_dialog = &maybe_str_pointer_for_dialog[1];
/* 0058234e */              }
/* 0058234e */              else
/* 0058234e */              {
/* 00582378 */                  void* field_3000 = this->field_3000;
/* 0058237b */                  void* r13_1 = ((char*)maybe_str_pointer_for_dialog - field_3000);
/* 00582381 */                  int64_t r12_4 = (r13_1 >> 2);
/* 00582391 */                  if (((r12_4 + 1) >> 0x3e) != 0)
/* 00582391 */                  {
/* 0058245a */                      std::__vector_base_common<true>::__throw_length_error();
/* 0058245a */                      /* no return */
/* 00582391 */                  }
/* 005823a1 */                  void* rbx_8 = (field_3010 - field_3000);
/* 005823ab */                  int64_t rbx_9 = (rbx_8 >> 1);
/* 005823b1 */                  if (rbx_9 < (r12_4 + 1))
/* 005823b1 */                  {
/* 005823b1 */                      rbx_9 = (r12_4 + 1);
/* 005823b1 */                  }
/* 005823c2 */                  if ((rbx_8 >> 2) >= 0x1fffffffffffffff)
/* 005823c2 */                  {
/* 005823c2 */                      rbx_9 = 0x3fffffffffffffff;
/* 005823c2 */                  }
/* 005823c9 */                  void* r14_1;
/* 005823c9 */                  if (rbx_9 == 0)
/* 005823c9 */                  {
/* 005823ee */                      r14_1 = &__macho_header;
/* 005823c9 */                  }
/* 005823c9 */                  else
/* 005823c9 */                  {
/* 005823d2 */                      if (rbx_9 > 0x3fffffffffffffff)
/* 005823d2 */                      {
/* 0058245f */                          std::__1::__throw_length_error.6399();
/* 0058245f */                          /* no return */
/* 005823d2 */                      }
/* 005823e5 */                      r14_1 = operator new((rbx_9 << 2));
/* 005823c9 */                  }
/* 005823f5 */                  arg2_1 = arg2;
/* 005823f8 */                  *(uint32_t*)((char*)r14_1 + (r12_4 << 2)) = arg2_1;
/* 00582404 */                  if (r13_1 > 0)
/* 00582404 */                  {
/* 00582412 */                      arg2_1 = _memcpy(r14_1);
/* 00582404 */                  }
/* 0058241d */                  this->field_3000 = r14_1;
/* 00582424 */                  this->maybe_str_pointer_for_dialog = (((char*)r14_1 + (r12_4 << 2)) + 4);
/* 0058242b */                  this->field_3010 = ((char*)r14_1 + (rbx_9 << 2));
/* 00582435 */                  if (field_3000 != 0)
/* 00582435 */                  {
/* 0058244c */                      /* tailcall */
/* 0058244c */                      return operator delete(field_3000);
/* 00582435 */                  }
/* 0058234e */              }
/* 00582370 */              return arg2_1;
/* 005822b9 */          }
/* 0058221e */      }
/* 00582455 */      std::__basic_string_common<true>::__throw_length_error();
/* 00582455 */      /* no return */
/* 005821b0 */  }

I'm not sure that the following is strictly 100% accurate, but at least representationally, I think it is closer (and definitely simpler) to what the original code (without low level implementation details of std::string) might have looked like. It's possible it still include low level details that could be removed as well:

#include <string>
#include <vector>
#include <stdexcept>

int64_t FooGUI::PrepareDialogFromDSP(int32_t arg2) {
  if (this->field_1028 != 0 && (*(this->field_1dc0 + 0x60) & 8) == 0) {
    return FooGUI::DoDialog(this, this->field_8->field_3e228, this->field_8->field_3e230, arg2);
  }

  std::string str1 = this->field_8->field_3e228;
  std::string str2 = this->field_8->field_3e230;

  // Add strings to the vector if they are not empty
  if (!str1.empty()) {
    this->__offset(0x3018).push_back(str1);  // Assuming this->__offset(0x3018) is a vector
  }

  if (!str2.empty()) {
    this->__offset(0x3018).push_back(str2);  // Assuming this->__offset(0x3018) is a vector
  }

  // Add arg2 to the vector directly
  this->field_3000.push_back(arg2);

  // Update the pointer for the dialog string
  this->maybe_str_pointer_for_dialog = &this->field_3000.back() + 1;
  this->field_3010 = &this->field_3000.front() + this->field_3000.size();

  return arg2;
}

Given this code example, we could also look at the cross references on functions such as the following, to potentially find more implementations:

  • std::__1::vector<std::__...allocator<char> > > >::push_back.6444(&p1, &p2);
  • std::__vector_base_common<true>::__throw_length_error();
  • std::__1::__throw_length_error.6399();
  • std::__basic_string_common<true>::__throw_length_error();

Further Reading

See Also

@ExecuteProtect
Copy link

It would be a really beneficial feature to have but there are tons of different variants so even just the one type would be a significant undertaking imo.

@0xdevalias
Copy link
Author

0xdevalias commented Jun 7, 2024

It seems like this could be a (more complex) version of 'outlining' / 'un-inlining' as defined by these issues:


but there are tons of different variants so even just the one type would be a significant undertaking imo

@ExecuteProtect Fair; though it would be interesting to understand (even ballpark) how many variants there would actually be; and how much more significant it would be over, for example, the existing C based 'un-inlining' implementations.

@xusheng6 xusheng6 added Type: Enhancement Issue is a small enhancement to existing functionality Component: Core Issue needs changes to the core Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core Issue needs changes to the core Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Type: Enhancement Issue is a small enhancement to existing functionality
Projects
None yet
Development

No branches or pull requests

3 participants