Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Support Intel assembly syntax #106

Open
pleroy opened this issue Jun 2, 2024 · 4 comments
Open

[REQUEST] Support Intel assembly syntax #106

pleroy opened this issue Jun 2, 2024 · 4 comments

Comments

@pleroy
Copy link

pleroy commented Jun 2, 2024

OSACA only supports the AT&T assembly syntax. On Windows the Intel syntax is more common (especially with MSVC). There are tools out there that purport to convert one format to the other, but they are not quite ready for prime time, and it's also unclear if OSACA would need to tackle all the complexity of, say, C++ name mangling (I'd assume that a symbol is a symbol is a symbol and that there is no need to understand what it corresponds to).

The parser seems to be pretty well structured, so I would assume that supporting a new syntax amounts to adding a new parser subclass and would be a moderate amount of work.

If you are not interested in doing it yourselves, would you accept a contribution? (Not a commitment, just exploratory at this point.)

@JanLJL
Copy link
Collaborator

JanLJL commented Jun 3, 2024

Hi @pleroy ,
I don't have any experience with the MSVC compiler, I would assume there should be a compiler flag to produce AT&T syntax out of the box, like the -masm=dialect flag for GCC?

Adding Intel syntax support to our parser is on our list of TODOs for a long time, however, we haven't prioritized it so far due to limited resources. If you want to contribute, we would be more than happy about a PR (even just an initial approach we can build on and discuss about together afterwards)!

@pleroy
Copy link
Author

pleroy commented Jun 3, 2024

Hi @JanLJL -- Thanks for your quick reply.

I don't think that there is a flag to ask the MSVC compiler to produce the AT&T syntax (or if there is, it is well hidden). The relevant documentation is here and it doesn't even mention the syntax that it produces. Note incidentally that godbolt displays the Intel syntax, which is not a proof of anything, but a hint that maybe that's the only thing there is for MSVC.

I don't want to put the Intel parser on the critical path of some performance-sensitive code that I am writing, but the time will come when I will really want to do precise latency/throughput analysis. I'll try to split the work in relatively small chunks for ease of review.

@pleroy
Copy link
Author

pleroy commented Nov 10, 2024

For the record I have started working on this. Far from complete, but I am making good progress. If someone is interested in taking a look, the work is happening in the fork https://github.com/mockingbirdnest/OSACA.

@pleroy
Copy link
Author

pleroy commented Dec 31, 2024

Hi @JanLJL -- I am ready to declare victory in my quest to add support for the Intel syntax to OSACA. I was able to use it on reasonably complex code and derived significant optimizations based on what OSACA was showing. If you are curious, you can see the optimizations in this PR and the kind of code that I am feeding to OSACA in the description of this PR.

As you can expect, the changes to add the Intel syntax are rather large and required some restructuring. You can see the diff here. Here is an overview of the changes:

  1. There is a new class, ParserX86Intel , to parse the Intel syntax. It produces instruction forms in the order of the syntax, i.e., the first operand is the destination.
  2. In order to gather properties that don't depend on the syntax, such as the properties of the registers, there is a new class ParserX86 which is the superclass of ParserX86ATT and ParserX86Intel.
  3. The matching of markers is now done on the instruction forms because it needs to be independent from details like the syntax of numeric literals that should only be known to the parser.
  4. There is a new phase in between parsing and semantics, the "normalization". Its purpose is to align the instructions with the data found in the ISA and architecture models. For Intel, this obviously involves swapping the operands. But it also exists for the other parsers: for AT&T for instance, it does the mov/movq adjustment so that the rest of the code doesn't have to care about it.
  5. The semantics is largely unchanged, except that it checks that the instructions it processes have been normalized.
  6. There are, of course, extensive tests and data files.

Please let me know how you want to proceed to upstream this work. There are about 2500 lines of diff (a good chunk of it for the tests) so it's maybe hard to review in a single PR. It should be possible to extract smaller chunks for review, although of course it will take some work to do such a split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants