-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which optimation and why? #333
Comments
Binary size is a common misconception in relation to performance -- only the pages from the binary that are actually used are loaded into memory. Higher binary size does not necessarily mean worse performance or loading times. For example, The ArrayFire libraries can reach multiple gigabytes in size, yet never page more than a few megabytes of memory. Intel does this a lot with Clear Linux, too -- they compile hot sections for multiple arches, which absolutely increases binary size, but those sections that aren't used simply aren't paged in. So, I'm not worried about bigger binaries from using
Now, there are a number of processor specific flags you might consider using as well. If you use Intel, In the end, though, this is what By default, the philosophy of the overlay is to let the compiler decide what to do. |
Ok, I test some more. But first, I only use -O2 because it works better for me than -O3. And then, with -O3 you enable -ftree-slp-vectorize, not -ftree-vectorize. I am on an Ryzen 2600X. But i test the other options without GRAPHITE. Perhaps I am happy. Until now I test all or nothing. |
Great! Actually, it seems |
Sorry, you are right. With -O3 -ftree-slp-vectorize and -ftree-loop-vectorize is enabled. |
What are you using to do your benches? |
First, LTO works very very good! I can reduce the ram usage of my system for about 100 MB and I think, if I start my system, I had to load only 600 MB instead of 700 MB and this was quicker. In the benchmarks, with LTO there was sometimes a little performance advantage, but never a disadvantage. I tried lame, p7zip and unrar. First CFLAGS="-march=native -O2 -pipe -ftree-vectorize" against CFLAGS="-march=native -O2 ${SEMINTERPOS} ${GRAPHITE} ${IPA} ${FLTO} -fuse-linker-plugin -ftree-vectorize -pipe" and CFLAGS="-march=native ${CFLAGS} -pipe". With -O3 lame was 1% faster, p7zip and unrar slower. For the he rest I tested only p7zip. There was no difference between CFLAGS="-march=native -O2 ${SEMINTERPOS} ${GRAPHITE} ${IPA} ${FLTO} -fuse-linker-plugin -ftree-vectorize -pipe" and CFLAGS="-march=native -O2 ${FLTO} -fuse-linker-plugin -ftree-vectorize -pipe". But then I removed GRAPHITE and it works faster. I switched to -O3 and it was slower again. I think, it is absolutely useless to build a hole system with -O3. 90% of all applications don't have any advantage. Do you think, you can copy files faster, when you build the coreutils or dolphin with -O3? Do you think, you can edit files faster, if you build nano with -O3? And the rest? It is possible, that the application runs faster with -O3 but in my tests p7zip and unrar runs slower. -O3 is only useful for applications, if you can see a benefit. Don't use it, if you only see, that you can build the application. |
I'm wondering more about your methodology. Which benchmarks did you run exactly? Note that Clear Linux uses Where |
@InBetweenNames are there still plans to integrate clear linux's patches in this overlay? |
Yes, there are -- I've started off by first merging in their CFLAGS. The Clear Linux issue should still be open. Relevant issue is #164. Next we need to complete the refactor to support multiple configurations. The relevant issue for that is #307 The LTO patching mechanism will likely be extended to pull in Clear Linux patches after that. Perhaps a |
Hey, I told you. Can you told me, why I can copy files faster, when I build the coreutils with -O3? And the rest? Have you ever tested something? Please tell me, what application runs faster with your optimization? With your own tests. If I start my system and I had to load 500 MB, I think, it takes longer if I had to load 800 MB. Is this in the phoronix benchmark? I tried something, which I use at the moment. |
@KlausMeier , I'll be honest with you -- I'm not sure what you're after here. Do you want me to justify the existence of GentooLTO as a project? What's your end goal with this issue? We can go back and forth all day about the finer points of optimization flags, but it seems like there is something else that's on your mind. |
And I think, most benchmarks are bullshit for a desktop system. I don't run a database ore wan't to compute pi. The performance of a desktop system is the time to load the application. Nothing else. And there is no benchmark for this. Yes, you can optimize one application very well, but don't think, your system runs faster, if you use this optimizations for the hole system. |
Just trying to get your mindset here. It seems like you think GentooLTO is a waste of time? |
Ok, I understand. It is better to leave.I want the best performance for a system. I told you, that for example LTO works perfect for me. Do you think, I use this overlay, if I think, it is useless? But when I see, that some optimations reduce my performance? I see no benchmarks in this thread where there is an proof for all this shit. And when I test something, you told me this? I spend very much time to test all your settings. And I report on my experiences. I don't see everybody else here, who do some testing. If you have a problem with that, have a very nice day. |
I tested a lot.But that was not welcome. I think, that you had to use optimations for every application separately, if you want the best performance. There is nothing for all. But if you think, I think GentoLTO is a waste of time because I spend my time to optimize it? I tested some applications with a lot of optimations. And I told you the results. And you told me "It seems like you think GentooLTO is a waste of time?". GentooLTO is no waste of time. But my time, I am using with GentooLTO was a very big waste of time for me, when I see your answer. |
You're asking questions that we just don't have the answers for yet. I never promised absolute performance. The flags that are chosen by default should make the system faster, but that doesn't mean they will! It says it right in the third line of the README. It's theoretically maximum speed. If it's not, that's a bug in GCC. It would be nice if people reported performance degradations upstream to GCC, but I'm not sure how many do. I post all upstream issues here, for reference. Most of the time, when I look upstream for a codegen related issue, it's already been reported which is nice.
The benchmarks are an open problem and people have been benchmarking, albeit sporadically, since the project started. Occasionally a new thread pops up with results. Most of the time they are encouraging, even if only by a few percent. But there's a long, long way to go. If there's one critical result that will come out of GentooLTO one day, it's how to benchmark an operating system. Where is the proof, you ask? Well, it's yet to be contributed. Certainly I'm not ready to spend time doing that. I have a PhD to finish, and GentooLTO works for my purposes in my domain-specific context. Will it work in yours? I don't know! You tell me. Let me ask you this: what is the best way of benchmarking a GentooLTO system? Is it by running a couple of
And I really appreciate that. But I want to be clear here: this is a volunteer project, and I don't expect anyone to do anything at the end of the day, really. It's nice when people contribute things. But don't think for a second that you are obligated to do anything for this project, nor is this project obligated to do anything for you, no matter how much time you spend on the project. This is not a job, and thank god for that. You should mess around with GentooLTO if you like it and are interested in it. If you're results oriented, you should adjust your expectations accordingly and use your own configuration.
Sure, if performance is your goal and none of the other stated goals of GentooLTO, then you will need to tailor your setup specifically for that. That's what
Okay, but I don't really want to spend more time arguing about the premise of the project. The goal is not "absolute performance for Gentoo", the goal is "theoretically absolute performance for gentoo, using deviations from that to file bug reports and improve open source software, including GCC". Personally, I have observed the default configuration has improved performance on my system. Do I think it'll make files copy faster? No, that's an IO-bound process, and Furthermore, I just don't have the time or resources to maintain a list of I just so happen to be one of those users that wants to run a database and compute pi, and yeah, I benefit from GentooLTO. That's why I take the time to maintain it. |
I think that there are some good points.
|
I used it. I like it. I tested it and I report my results. And for all of this the answer was "It seems like you think GentooLTO is a waste of time?". Now I removed it. Today I spend two hours with testing. But facts are not welcome. |
I think that you're jumping to conclusions here. |
The issue in question is #288 and it's great work that I intend to try myself when I get a chance. @KlausMeier the only reason I said that was because you seemed to be questioning the very premises of the GentooLTO project. It seemed like you were unhappy with the results and wanted me to change the project defaults to what you found to work best on your system. You wanted proof that the defaults were in fact the highest performance. Perhaps I could have worded my question better, but this is really what I was after. Ultimately though, doing this would mean shifting the project focus from bug finding to "just give me the best performance", and that's something I simply can't promise anyone. The default configuration SHOULD give the best performance but that doesn't mean it does. #288 is very interesting and I think would fit really nicely into the new GentooLTO layout. |
I SMELL TROLLING........... |
This Troll Was Created ON May 14, 2019 , where he was before that ..... I New User On Github Just TO Troll Here ...... |
I think they just didn't have a GitHub account before they were interested in GentooLTO. |
GentooLTO is the best overlay around we need to do some bench . eix phoronix
[1] "bobwya" /var/lib/layman/bobwya im installing it on the weekend along with clear linux and see how my prelinked system goes vs clear linux . |
@KlausMeier
p.s. Have my whole system LTOed. No single performance test was performed on my system yet. |
Going to close this issue as it seems nothing more is coming out of it. If it needs to be reopened, just comment. |
Have you test all your optimations and find a benefit? My standard flags are CFLAGS="-march=native -O2 -pipe -ftree-vectorize"
Then I add -flto. The binary file size was very much better and the performance was a little bit better. I testet it with lame, p7zip and unrar.
Next was the full thing with -O2: CFLAGS="-march=native -O2 ${SEMINTERPOS} ${GRAPHITE} ${IPA} ${FLTO} -fuse-linker-plugin -ftree-vectorize -pipe" File size was the same but performance was a little bit grimmer.
At last I testet the full thing: CFLAGS="-march=native ${CFLAGS} -pipe -ftree-vectorize" The file size was bigger then the standard optimation without lto. For lame the performance was a little bit better, for p7zip it was a little bit grimmer.
I think, if you build a whole system with -O3, it is a very big disadvantage. Do you think, you can copy your files faster, if you build the coreutils with -O3? The opposite is the case. The binary is much bigger, it takes more time to load uses more RAM.
Yes, there are some packages, which works faster with -O3 or -Ofast. But they have this flags in portage. But don't use -O3 for everything.
The text was updated successfully, but these errors were encountered: