Pivoting on target-cpu/arch #847

lilith · 2017-01-09T01:52:31Z

We have a couple common generations of CPU above the baseline x86_64 instruction - namely sandybridge and haswell, with AVX and AVX2/BMI/BMI2 respectively.

LLVM-backed languages and GGC 4.9+ all support "x86-64, sandybridge, haswell, native" for the -march/--target-cpu parameters. GCC 4.8 uses alternate identifiers corei7-avx and core-avx2 for those platforms.

These map nicely to MSVC /arch:AVX and /arch:AVX2, which is as granular as MSVC goes.

For now I'm using an extra field in .conan/settings.yml: target_cpu: [x86, x86-64, nehalem, sandybridge, haswell, native], but I need to move this down into the packages I consume as well, if I want to pivot on sandybridge/haswell support.

Has this come up before? Any convention to adopt?

The text was updated successfully, but these errors were encountered:

lasote · 2017-01-09T08:25:14Z

I have no previous experience managing those alternative architectures, but, why they are not just additional arch setting values?

I'm not sure either about the native setting, from the gcc page it says This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine It sounds like it should not be a setting value, because it's variable and not determinable.

Any community feedback would be great.

lilith · 2017-01-09T17:58:42Z

If there is no conditional logic based on arch, and we only support the extra instructions in 64-bit mode, then 'arch' would be fine. But do we want to exclude 32-bit pointer-sized code optimized for these platforms?

lilith · 2017-03-10T20:38:24Z

These instructions sets are actually available for both x86 and x86_64 architectures.

I think native would need special handling - it should be source-only.

I'm bumping up against this pretty hard with libjpeg-turbo. It's 2-3x faster when compiled for haswell vs baseline x86_64, but recompiling from source pushes Travis over the edge and hits the 45-minute timeout.

lasote · 2017-10-22T06:39:56Z

I would like to push this for the 0.29, some other user has required the same thing and it's time to establish a convention to follow. Initially only the base settings, later we can think about the build helpers to inject some needed flag and some detection of the CPU microarchitecture (https://pypi.python.org/pypi/cpuid @fpelliccioni) to warn if a bad setting is detected. So @memsharded, lets work on it.
@nathanaeljones I think we could:

arch: 
  x86:
       microarch: [None, "nehallem", "bonnell", "sandy_bridge", "ivy_bridge", "silvermont", "haswell", "broadwell", "skylake", "goldmont", "kaby_lake", "coffee_lake"]
  x86_64:
       microarch: [None, "nehallem", "bonnell", "sandy_bridge", "ivy_bridge", "silvermont", "haswell", "broadwell", "skylake", "goldmont", "kaby_lake", "coffee_lake"]
  ppc64le:
  ppc64:
  armv6:
  armv7:
  armv7hf:
  armv8:
  sparc:
  sparcv9:
  mips:
  mips64:
  avr:

I don't like to repeat the microarchitectures, but I don't see a better approach.
The "None" allows the user to not specify the subsetting.

lilith · 2017-10-22T16:34:29Z

I'd also suggest that we may want to support 'native', i.e, whichever features are supported on the build machine. This value would need to disable build caching, though - all packages would need to be built from source.

Also, we may want to consider matching the gcc/llvm names as closely as possible. For GCC < 5 we'll have to map a few anyway, though.

lilith · 2017-10-22T16:45:11Z

Generations are also not very specific, and may not work on mobile or low-end editions.

I started with x86_64, nehalem, sandybridge, haswell, native, but skylake should probably be added for TSX support.

I'm not sure there's much value in including tick releases unless they add new instruction sets.

lilith · 2017-10-22T17:00:44Z

For LLVM:

llc -march=x86 -mattr=help
llc -march=x86-64 -mattr=help

I forget how to list the values for GCC.

tru · 2017-10-23T13:10:24Z

For arm we are shipping a couple of different architectures right now:

armv7 hardfloat
armv7 softfloat
armv7 hardfloat + neon
armv7 hardfloat + thumb + neon

I think it makes sense for the armv7 platform to contain: float=["hard", "soft"], thumb=[True, False], neon=[True, False]

On some platforms we also need to set the the specific FPU like this: -mfpu=vfpv3-d16 not sure if that is something that should be abstracted in conan though.

fpelliccioni · 2017-10-23T18:46:45Z

I was thinking about it, here I leave some of my conclusions:

A. Relying on micro-architecture is a breakthrough, but ... what if I do
the following?

g++ xxx.cpp -O3 -march=sandybridge ...
g++ xxx.cpp -O3 -march=haswell ...
g++ xxx.cpp -O3 -march=skylake ...

It is likely that the resulting binary files of 1, 2 and 3 are exactly
the same, in such a case, it does not make sense to differentiate them as
they are compatible or equal binaries/packages.

B. Some Intel micro-architectures have the same extensions as others. For
example, according to the Intel tick-tock model, in theory, Sandybridge and
Ivybridge are equivalent (with respect to sets of instructions or
extensions).
Therefore, it is not worth differentiating them.

I think, in both cases, what really matters is what sets of instructions
were used.
For this, I am working on a tool that examines an executable or library
(.a, .so, .dll, etc ...) and report which sets of instructions were used.

For example:

get_extensions("a.out",...) == ['MODE64', 'SSE', 'AVX']

In this way, the micro-architecture no longer matters, but what really
matters are the sets of instructions used.

I think Conan packages can have a setting to determine which extensions
the binaries use. For example:

class HelloConan(ConanFile):
    settings = "os", "compiler", "build_type", "arch", "extensions" [0]

extensions have to be assigned after the binary was compiled, I imagine,
creating a new method (member function) in the ConanFile class, for
example:

def set_extensions(self):
    self.extensions = get_extensions(... list of binary files ...)

On the client side, when Conan looks for a package, it can identify which
instructions are available for the processor (using the cpuid python
package, for example) and in this way find the package that best fits.

I have a demo of the tool that analyzes the executables, if the idea is of
interest/utility to the community, I could invest some time on it.
The demo for now only works for x86 and Elf format, but it can be
extensible for other architectures and formats (PE, Mach-O).

lilith · 2017-10-23T18:51:47Z

cat /proc/cpuinfo shows the following flags for me:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

My questions would be

(a) how do we map these to, say, MSVC that only supports AVX and AVX2, and infers other instruction support based on those?

(b) is set math cost-prohibitive for conan?

(c) does the implementation cost of full instruction support outweigh the benefit? I would see nehalem/sandybridge/haswell/skylake/native as quite a bit simpler to implement and test. Permutations make things harder.

(d) don't we still have to specify either an instruction group or processor generation when publishing pre-compiled binaries for other's use?

lilith · 2017-10-23T18:53:42Z

@fpelliccioni The tool you describe is exactly what I've been looking for to validate my binaries. With so many compilers involved in a build it can be very difficult to ensure that an unsupported instruction didn't sneak in somewhere.

lasote · 2017-10-24T01:42:08Z

Some comments:
@nathanaeljones I'm not sure about the native. I understand your point but a setting should be used to determine the binary you are getting. Any ideas @memsharded? I think native should be avoided in favor of a detection (or user declaration) of a default microarchitecture in the default profile.

@fpelliccioni About:

It is likely that the resulting binary files of 1, 2 and 3 are exactly
the same, in such a case, it does not make sense to differentiate them as
they are compatible or equal binaries/packages.

Yes, but if you know that your library is built exactly for those different microarchitectures, you can control it in the package_id() method to get only one binary. But conceptually, the code could be different if you build it for different microarchitectures, right?

About the instructions set, I understand it's what really matters but yeah, as you can support many of them it makes unmaintainable and crazy for the user. The combination is quite infinite. So it looks it couldn't be a setting.

About detecting the setting after building the library is kind of chicken-egg problem, now it's working like: you declare the settings => conan builds the library and calculating a package ID. You are proposing the opposite, the settings are determined by the built library. It affects to the core model of conan, and it's not possible to do it, in my opinion.

tru · 2017-10-24T04:28:39Z

Detection is not possible when cross-compiling either.

lasote · 2017-11-21T11:43:53Z

Hi all,

Given the previous experience modeling the standard of the language (still WIP), I have some observations, the main question is: Should we model this with options?

In Standard as an option #2042 I proposed a way to have options presets, so (pending review) we could do something similar to this:

class MyLib(ConanFile):
   ...    
 
   def options(self, config):
       config.add_microarchitecture() # We could add a preset list of march like the described here: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/x86-Options.html#x86-Options

Most users are not concerned about this, so adding an option only when needed sounds reasonable.
Because with Visual Studio is very different, {IA32, SSE, SSE2, AVX, AVX2}. We could add different options if compiler == "Visual Studio" (options depending on settings)
About the native I still think that is not a good idea because we don't model as a setting the real microprocessor so we can't know in a deterministic way the real configuration for the generated binary.
About the automatic detection with the @fpelliccioni library. I see it as a very interesting next step, as a tool that maybe can be used before the build or package to check the built binaries, but not to autodetect a setting/option.
About the compatibility between them, for example, the instruction set of haswell is compatible with a skylake but I think it doesn't matter. If the user is generating specialized packages for different processors, it's ok to have both of them and let the consumer specify the better for him. But with package_id() method, a recipe creator is not providing a different binary, could model somethink like: "If self.options.march == skylake: self.info.options.marc=haswell or something similar, so it keeps opened to the binary optimizations.

lilith · 2017-11-22T19:50:04Z

I vote to strike 'native' from this feature request. It's orthogonal.

I would suggest that whichever method is selected, that it is inherited by the tree, such that sub-dependencies are built with the same ISA by default

lasote · 2018-02-08T09:56:41Z

Are you saying that when I set environment variables in a profile, those environment variables do not affect the hash of any packages built with that profile?

Yes, exactly that. The only things that affect a package ID are the settings, the options and the requirements of the package.

DavidZemon · 2018-02-09T17:40:06Z

I've been reading, re-reading, and triple-re-reading this thread. I tried looking at the referenced PR (#2042) but only understood a little of it. As always, so much more complex than I initially imagined it would be.

I think we all agree that the theoretical best solution is to make Conan aware of specific instruction sets. However, @lasote might be right when he said

... it makes unmaintainable and crazy for the user. The combination is quite infinite. So it looks it couldn't be a setting.

But the other solutions are not ideal either. I would push for removal of the 1.1 milestone target and leave it blank. With no promise of a deadline, we can then all work together to come up with some kind of a solution (even if it's a major breaking API and doesn't come out until Conan v2 or v3) that minimizes the burden on the 90% of users that don't care but allows the 10% of us that do care very much to specify exactly which instruction sets should be enabled.

The C/C++ world still does not have a top-notch package manager solution, and I don't think any solution which fails to properly tackle this specific problem will ever gain the kind of popularity that Maven, NPM, and Pip have gathered. So, before Conan tries to take over the world, I think it should solve this problem in the absolute best possible way.

Solving this problem won't be easy. I think it will require that Conan is capable of mapping instruction sets to compiler flags. This will take a lot of research to provide by default a useful portion of this mapping for as many different compilers and instruction sets as possible. The mapping will also need to be user-extendable, just like the rest of settings.yml (though the mapping may reside in a different file).

I think the burden on the end-user could be eased by providing optional "families", such as i386 which would encompass a large group of instruction sets. The family i786 would then reference i686 and append SSE2 and SSE3.

It may be worth adding a boolean to the profile that says "I will accept binaries that are compatible with my CPU but do not fully utilize all of its features," or "I want to recompile any package that does not fully utilize my CPU." This would also require having a list of conflicting instruction sets or families (to prevent trying to cross-link 32- and 64-bit binaries).

anton-danielsson · 2021-02-20T22:42:06Z

Having something like fpu or microarch in settings.yml would be really cool.
The difference between in code compiled with for example neon vs vfp or SSE vs AVX_512 can be quite large.

jwillikers · 2021-10-12T15:16:30Z

Yeah, I think supporting the various CPU architectures out there is absolutely necessary for Conan to adequately cover binary compatibility. It seems like there should be sub-settings to cover the intricacies of specific architectures, like the FPU for armv7hf and whether or not to use thumb in this case, which ends up being a distinct ISA.

vadixidav · 2021-12-01T23:48:50Z

I also needed to add things to my settings.yaml for embedded arm:

    arm-none-eabi-gcc:
        version: ["8.3"]
        cpu: ["cortex-m7"]
        fpu: ["fpv5-sp-d16"]
        float-abi: ["soft", "softfp", "hard"]

I don't really know if compiler is the right place to put this, but it didn't look like it would have been easy to stick it into arch either, as none of those have sub options. It might be that some of this is redundant with the arch as well, as the arch is armv7hf. However, even though the arch is armv7hf, it is still possible to build libraries using the soft ABI and even to consume them with softfp, so I don't know what the right answer is here.

I would like it if there was a more comprehensive solution for handling of the floating point here on arm, as I need to be diligent to ensure that I set the settings in accordance with the compiler flags. As you can see, I have been careful to name the settings exactly as the compiler flags to avoid problems and simplify my profile file.

In general, it seems like every capability or option on a CPU should probably have its own Conan setting if this is to be managed.

jwillikers · 2021-12-02T14:41:08Z

@vadixidav That looks a lot like what I've been thinking, though I've been leaning towards putting it under arch instead of the compiler, even though it is a bit awkward that there aren't any sub-options there and it looks like the architectures are just a list. It seems like the plan might be to convert the architecture settings in arch to the necessary compiler flags in the build-system generators. It's interesting that you created a GNU Arm Embedded compiler, too. I've thought about doing so but for simplicity I've just used the gcc compiler setting and set up the compiler name in the CMakeToolchain generator.

A rough example taking from yours might look as follows.

arch:
  thumbv7em:
    fpu: [None, "fpv5-sp-d16"]
    float-abi: ["soft", "softfp", "hard"]

This is going to get tricky pretty quickly for validating architecture options don't conflict, like selecting fpu: None and float-abi: "hard". I like how Rust lays out their Platform Support and the consistent naming there, though I don't think that naming captures all the available options one might one to configure for a specific architecture.

robinchrist · 2024-12-06T21:10:30Z

I have also stumbled over this... Here are my thoughts:

In order to keep the maintenance burden on a sensible level, I think as much responsibility as possible should be shifted to the user.
From my point of view, it is the most important point that anything affecting the set of instructions used in the output binary and the target for which the binary is optimized goes into the calculation of the package id.

I see two possible solutions:

Make this very barebone: Provide a new kind of option where you can add a list of arbitrary compiler flags that will go into the calculation of the package id - This means that the user will have to create a profile for each compiler that is used
Support the user a bit more: Provide some standardised options (like march, mtune) instead of manually specifying the flags. The difficulty I see here is figuring out WHICH standardised options to provide (especially ARM as mentioned above with float-abi, fpu, etc adds complexity). Also those options would be different between compilers (MSVC vs Clang/GCC), which could negatively affect the user experience.

Another issue in general is how the architecture / vectorization options are handled in the library itself - some libraries only provide very limited architecture options, because there is complex logic (conditional inclusion of headers, conditional setting of constants that tune the calculation, etc). Many times this is also simply handled using #ifdef __AVX2__ for example.
Not sure how this point affects the integration into conan...

valgur · 2024-12-09T12:56:04Z

I agree with @robinchrist. arch.march and arch.mtune sub-settings to arch would be very useful for performance-critical libraries and would avoid unnecessary ad-hoc complexity in related recipes.

robinchrist · 2024-12-09T13:39:24Z

I agree with @robinchrist. arch.march and arch.mtune sub-settings to arch would be very useful for performance-critical libraries and would avoid unnecessary ad-hoc complexity in related recipes.

arch.mcpu would also be needed, as -march / -mtune works totally different on ARM

lasote added the Feedback please! label Jan 9, 2017

niklasalbin mentioned this issue Apr 26, 2017

'settings.target_cpu' doesn't exist imazen/imageflow#129

Closed

lasote added this to the 0.29 milestone Oct 22, 2017

lasote added the help wanted label Oct 22, 2017

lilith closed this as completed Oct 22, 2017

lilith reopened this Oct 22, 2017

lasote self-assigned this Nov 14, 2017

lasote modified the milestones: 0.29, 1.0, 0.30 Nov 20, 2017

lasote modified the milestones: 0.30, 1.0 Dec 5, 2017

lasote modified the milestones: 1.1, 1.2 Feb 20, 2018

lasote modified the milestones: 1.2, 1.3 Mar 13, 2018

lasote added feature approved and removed to do labels Apr 19, 2018

lasote removed this from the 1.3 milestone Apr 19, 2018

lasote added the to do label Apr 19, 2018

lasote added the type: feature label Nov 8, 2018

danimtb added priority: low complex: huge complex: medium and removed feature approved priority: low to do labels Dec 4, 2018

fulara mentioned this issue Apr 18, 2020

[feature] new setting on compiler: arbitrary setting forcing additional compiler flags. #6878

Closed

1 task

SSE4 mentioned this issue Dec 25, 2020

Feature: add Apple Catalyst support (as new os.subsystem) #8264

Merged

5 tasks

uilianries mentioned this issue Oct 14, 2022

(#12705) OpenBlas: Add arch target option conan-io/conan-center-index#13478

Closed

4 tasks

tttapa mentioned this issue Jun 6, 2024

openblas: cross-compilation support conan-io/conan-center-index#24171

Merged

3 tasks

EstebanDugueperoux2 mentioned this issue Dec 4, 2024

[package] assimp/5.4.3: Undefined minizip symbols on assimp static build conan-io/conan-center-index#25765

Open

robinchrist mentioned this issue Dec 9, 2024

otfftpp: Add recipe conan-io/conan-center-index#26137

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivoting on target-cpu/arch #847

Pivoting on target-cpu/arch #847

lilith commented Jan 9, 2017 •

edited

Loading

lasote commented Jan 9, 2017

lilith commented Jan 9, 2017 via email •

edited

Loading

lilith commented Mar 10, 2017 •

edited

Loading

lasote commented Oct 22, 2017 •

edited

Loading

lilith commented Oct 22, 2017

lilith commented Oct 22, 2017 •

edited

Loading

lilith commented Oct 22, 2017

tru commented Oct 23, 2017

fpelliccioni commented Oct 23, 2017 •

edited

Loading

lilith commented Oct 23, 2017 •

edited

Loading

lilith commented Oct 23, 2017

lasote commented Oct 24, 2017

tru commented Oct 24, 2017

lasote commented Nov 21, 2017

lilith commented Nov 22, 2017

lasote commented Feb 8, 2018

DavidZemon commented Feb 9, 2018

anton-danielsson commented Feb 20, 2021

jwillikers commented Oct 12, 2021

vadixidav commented Dec 1, 2021

jwillikers commented Dec 2, 2021 •

edited

Loading

robinchrist commented Dec 6, 2024 •

edited

Loading

valgur commented Dec 9, 2024

robinchrist commented Dec 9, 2024

Pivoting on target-cpu/arch #847

Pivoting on target-cpu/arch #847

Comments

lilith commented Jan 9, 2017 • edited Loading

lasote commented Jan 9, 2017

lilith commented Jan 9, 2017 via email • edited Loading

lilith commented Mar 10, 2017 • edited Loading

lasote commented Oct 22, 2017 • edited Loading

lilith commented Oct 22, 2017

lilith commented Oct 22, 2017 • edited Loading

lilith commented Oct 22, 2017

tru commented Oct 23, 2017

fpelliccioni commented Oct 23, 2017 • edited Loading

lilith commented Oct 23, 2017 • edited Loading

lilith commented Oct 23, 2017

lasote commented Oct 24, 2017

tru commented Oct 24, 2017

lasote commented Nov 21, 2017

lilith commented Nov 22, 2017

lasote commented Feb 8, 2018

DavidZemon commented Feb 9, 2018

anton-danielsson commented Feb 20, 2021

jwillikers commented Oct 12, 2021

vadixidav commented Dec 1, 2021

jwillikers commented Dec 2, 2021 • edited Loading

robinchrist commented Dec 6, 2024 • edited Loading

valgur commented Dec 9, 2024

robinchrist commented Dec 9, 2024

lilith commented Jan 9, 2017 •

edited

Loading

lilith commented Jan 9, 2017 via email •

edited

Loading

lilith commented Mar 10, 2017 •

edited

Loading

lasote commented Oct 22, 2017 •

edited

Loading

lilith commented Oct 22, 2017 •

edited

Loading

fpelliccioni commented Oct 23, 2017 •

edited

Loading

lilith commented Oct 23, 2017 •

edited

Loading

jwillikers commented Dec 2, 2021 •

edited

Loading

robinchrist commented Dec 6, 2024 •

edited

Loading