-
Notifications
You must be signed in to change notification settings - Fork 233
MemoryMeasurementsBDX
This blog post describes the case when memory measurements on Intel Broadwell EP/EX systems are too high.
If you measure the memory traffic on Intel Broadwell EP/EX with the MEM*
groups, some systems
return strange high numbers. I got some reports about that from different computing centers.
+------------------------------------------+---------+------------+-----------------+
| Event | Counter | Core 0 | Core 10 |
+------------------------------------------+---------+------------+-----------------+
| INSTR_RETIRED_ANY | FIXC0 | 2093481000 | 1528120000 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 5590396000 | 5125980000 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 3975666000 | 3638550000 |
| PWR_PKG_ENERGY | PWR0 | 53.7656 | 55.2363 |
| PWR_DRAM_ENERGY | PWR3 | 11.8070 | 12.5511 |
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE | PMC0 | 0 | 0 |
| FP_ARITH_INST_RETIRED_SCALAR_DOUBLE | PMC1 | 379993 | 379990 |
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE | PMC2 | 902215000 | 902215000 |
| CAS_COUNT_RD | MBOX0C0 | 56361240 | 56586960 |
| CAS_COUNT_WR | MBOX0C1 | 28293350 | 28353280 |
| CAS_COUNT_RD | MBOX1C0 | 56474850 | 56681730 |
| CAS_COUNT_WR | MBOX1C1 | 28406420 | 28456820 |
| CAS_COUNT_RD | MBOX2C0 | 56366030 | 56582170 |
| CAS_COUNT_WR | MBOX2C1 | 28295140 | 28349200 |
| CAS_COUNT_RD | MBOX3C0 | 56362180 | 56567480 |
| CAS_COUNT_WR | MBOX3C1 | 28293360 | 28344380 |
| CAS_COUNT_RD | MBOX4C0 | 56595560 | 141008100000000 |
| CAS_COUNT_WR | MBOX4C1 | 28384190 | 141008100000000 |
| CAS_COUNT_RD | MBOX5C0 | 56596110 | 141008100000000 |
| CAS_COUNT_WR | MBOX5C1 | 28384440 | 141008100000000 |
| CAS_COUNT_RD | MBOX6C0 | 0 | 0 |
| CAS_COUNT_WR | MBOX6C1 | 0 | 0 |
| CAS_COUNT_RD | MBOX7C0 | 0 | 0 |
| CAS_COUNT_WR | MBOX7C1 | 0 | 0 |
+------------------------------------------+---------+------------+-----------------+
MBOX4-5 should not be active.
Commonly, the problem comes from only partly deactivated memory controller counter registers. Intel Broadwell EP systems have 4 memory channels active in most cases. LIKWID does not know how many channels (PCI devices) are active, it tests all and marks them available if all tests are positive. Besides checking the availibility of the PCI devices, it also tries to read and write to the counter registers. So for these unreliable systems, the checks pass successfully for 6 channels. Often, the additional memory channel devices return zero, so it does not make any difference.
Since there is not much LIKWID can do (besides the accessibility checks), the only way is to update
the MEM*
groups for the systems and remove the faulty memory channels:
cp <LIKWID_BASE>/share/likwid/perfgroups/broadwellEP/MEM.txt ~/.likwid/groups/broadwellEP/MEM_BDX.txt
edit ~/.likwid/groups/broadwellEP/MEM_BDX.txt
- remove unneeded registers
- update metric formulas
likwid-perfctr -g MEM_BDX ...
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing