-
Notifications
You must be signed in to change notification settings - Fork 230
IcelakeSP
Intel introduces with the IcelakeSP architecture a generic lookup and configuration mechanism for the Uncore units called "PMON Discovery mechanism" in the Uncore monitoring reference guide. It is mentioned a few times in the document but not described in detail. The main idea seems to be to provide the configuration of the performance monitoring units and their counters, called PMON blocks, at specific memory addresses to make it machine-readable. Part of the unit-specific configurations is the "Global Status bit". It is required to establish the mapping between units and bit offsets in the global overflow status register. Since the mechanism is not documented and the bit offsets are unknown, LIKWID does not use this global overflow status register but only the unit-local overflow registers. So in order to detect overflows, LIKWID reads each unit-local overflow register for all units that are part of the event set.
The unit to bit offset mapping is fixed for architecture before IcelakeSP and therefore documented in the appropriate Uncore monitoring reference guides. LIKWID uses a single read of the global status register to know which unit overflowed to further read the unit-local overflow status register to identify the overflowed counter. This requires commonly less read operations.
Intel® Icelake SP Performance groups
The input file for the events on Intel® Icelake SP can be found here.
- Core-local counters
-
Socket-wide counters
- Energy counters
- Uncore management fixed-purpose counter
- Uncore management general-purpose counters
- Last Level cache counters
- Power control unit fixed-purpose counters
- Power control unit general-purpose counters
- Memory controller fixed-purpose counters
- Memory controller general-purpose counters
- Memory controller free-running counters
- UPI Link Layer counters
- M3UPI counters
- IIO general-purpose counters
- IIO fixed-purpose counters
- IRP general-purpose counters
- Mesh-2-Memory general-purpose counters
- PCIe general-purpose counters
Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.
Counter name | Event name |
---|---|
FIXC0 | INSTR_RETIRED_ANY |
FIXC1 | CPU_CLK_UNHALTED_CORE |
FIXC2 | CPU_CLK_UNHALTED_REF |
FIXC3 | TOPDOWN_SLOTS |
Option | Argument | Description | Comment |
---|---|---|---|
anythread | N | Set bit 2+(index*4) in config register | |
kernel | N | Set bit (index*4) in config register |
With the Intel® Icelake microarchitecture a new class of core-local counters was introduced, the so-called perf-metrics. The reflect the first level of the Top-down Microarchitecture Analysis tree.
Counter name | Event name |
---|---|
TMA0 | RETIRING |
TMA1 | BAD_SPECULATION |
TMA2 | FRONTEND_BOUND |
TMA3 | BACKEND_BOUND |
The events return the fraction of slots used by the event.
The Intel® IcelakeSP microarchitecture provides 4 general-purpose counters consisting of a config and a counter register.
Counter name | Event name |
---|---|
PMC0 | * |
PMC1 | * |
PMC2 | * |
PMC3 | * |
PMC4 | * (only available without HyperThreading) |
PMC5 | * (only available without HyperThreading) |
PMC6 | * (only available without HyperThreading) |
PMC7 | * (only available without HyperThreading) |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
kernel | N | Set bit 17 in config register | |
anythread | N | Set bit 21 in config register | The anythread option is deprecated! Please check the documentation how to use it on Icelake |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
invert | N | Set bit 23 in config register | |
in_transaction | N | Set bit 32 in config register | Only available if Intel® Transactional Synchronization Extensions are available |
in_transaction_aborted | N | Set bit 33 in config register | Only counter PMC2 and only if Intel® Transactional Synchronization Extensions are available |
The Intel® IcelakeSP microarchitecture provides one register for the current core temperature.
Counter name | Event name |
---|---|
TMP0 | TEMP_CORE |
The Intel® IcelakeSP microarchitecture provides one register for the current core voltage.
Counter name | Event name |
---|---|
VTG0 | VOLTAGE_CORE |
The Intel® IcelakeSP microarchitecture provides measurements of the current energy consumption through the RAPL interface.
Counter name | Event name |
---|---|
PWR0 | PWR_PKG_ENERGY |
PWR1 | PWR_PP0_ENERGY |
PWR2 | PWR_PP1_ENERGY (*) |
PWR3 | PWR_DRAM_ENERGY |
PWR4 | PWR_PLATFORM_ENERGY (+) |
(*) Commonly not supported (+) Often returns zeros
The Intel® Icelake X microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for Intel® Xeon® Processor
Scalable Memory Family
In this capacity, the UBox acts as the central unit for a variety of functions:
- The master for reading and writing physically distributed registers across using the Message Channel.
- The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core.
-
The UBox serves as the system lock master used when quiescing the platform
(e.g., Intel® UPI bus lock).
The single fixed-purpose counter counts the clock frequency of the clock source of the uncore. The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
UBOXFIX | UNCORE_CLOCK |
The Intel® Skylake X microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for Intel® Xeon® Processor
Scalable Memory Family
In this capacity, the UBox acts as the central unit for a variety of functions:
- The master for reading and writing physically distributed registers across using the Message Channel.
- The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core.
-
The UBox serves as the system lock master used when quiescing the platform
(e.g., Intel® UPI bus lock).
The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
UBOX0 | * |
UBOX1 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register |
The Intel® Icelake X microarchitecture provides measurements of the LLC coherency engine in the uncore. The description from Intel®:
The LLC coherence engine and Home agent (CHA) merges the caching agent and home
agent (HA) responsibilities of the chip into a single block. In its capacity as a caching
agent the CHA manages the interface between the core the IIO devices and the last
level cache (LLC). In its capacity as a home agent the CHA manages the interface
between the LLC and the rest of the UPI coherent fabric as well as the on die memory
controller.
The LLC hardware performance counters are exposed to the operating system through the MSR interface. The maximal amount of supported coherency engines for the Intel® Icelake X microarchitecture is 40. It may be possible that your systems does not have all CBOXes, LIKWID will skip the unavailable ones in the setup phase. The name CBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
CBOX<0-39>C0 | * |
CBOX<0-39>C1 | * |
CBOX<0-39>C2 | * |
CBOX<0-39>C3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
match0 | 28 bit hex value | Set bits 32-57 in config register named UmaskExt | Only for events LLC_LOOKUP, TOR_INSERTS and TOR_OCCUPANCY events. Check uncore documentation for explanations |
state | 8 bit hex value | Set bits 8-15 in config register, similar to umask | Only for event LLC_LOOKUP LLC F: 0x80, LLC M: 0x40, LLC E: 0x20, LLC S: 0x10, SF H: 0x08, SF E: 0x04, SF S: 0x02, LLC I: 0x01 |
Event LLC_LOOKUP uses the umask field for the state specification. Further filters are in a field called UmaskExt. These filters can be addressed with the MASK0 option, thus LLC_LOOKUP_I:CBOX0C0:MATCH0=0x8 would count cache lines in I (=invalid) state filtered by RFOs (MATCH0=0x8). Most LLC_LOOKUP events use umask/state 0xFF for all states. If you want to use multiple states, use the STATE option like STATE=0xE for all SF states.
Bit for the MATCH0 option for LLC_LOOKUP event:
The events TOR_INSERTS and TOR_OCCUPANCY also use the UmaskExt field but with different width and meaning:
Bit offset | Description |
---|---|
0 | Data Reads- local or remote. includes prefetches |
1 | All write transactions to the LLC - including writebacks to LLC and uncacheable write transactions Does not include evict cleans or invalidates |
2 | Flush or Invalidates |
3 | RFOs - local or remote. includes prefetches |
4 | Code Reads- local or remote. includes prefetches |
5 | Any local or remote transaction to the LLC. Includes prefetches |
6 | Any local transaction to LLC, including prefetches from Core |
7 | Any local prefetch to LLC from an LLC |
8 | Any local prefetch to LLC from Core |
9 | Snoop transactions to the LLC from a remote agent |
10 | Non-snoop transactions to the LLC from a remote agent |
11 | Transactions to locally homed addresses |
12 | Transactions to remotely homed addresses |
The events WRITE_NO_CREDITS and READ_NO_CREDITS use the UmaskExt as real extension of the default umask field. Each of the bits corresponds to a memory controller (0-13). The first eight are covered by the bits in umask. The other 6 bits are in the UmaskExt field addressable with the MATCH0 option.
The event LLC_VICTIMS uses the MATCH0 option to differentiate between 'local only' and 'remote only' victims. If nothing is set, 'all' are counted. There are only two settings: MATCH0=0x20 for 'local only' and MATCH0=0x80 for 'remote only'.
The Intel® Icelake X microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the Ice Lake die, responsible for
distributing power to core/uncore components and thermal management. It runs in
firmware on an internal micro-controller and coordinates the socket’s power states.
Note: Power management is not completely centralized. Many units employ their own power
saving features. Events that provide information about those features are captured in
the PMON bocks of those units. For example, Intel® UPI Link Power saving states and
Memory CKE statistics are captured in the Intel® UPI Perfmon and IMC Perfmon
respectively.
The PCU offers four fixed-purpose counters to retrieve the cycles CPU cores stay in state C6, C3, P6 and P3. The uncore management performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.
Bit offset | Description |
---|---|
0 | Just entries that Hit the LLC - Bit offsets 0 is XORed with bit offset 1. No filtering applied if both bits are either 0 or 1 |
1 | Just entries that Missed the LLC - Bit offsets 1 is XORed with bit offset 0. No filtering applied if both bits are either 0 or 1 |
2 | Filter on requests to memory mapped to DDR |
3 | Filter on requests to memory mapped to PMM |
4 | Filter on requests to memory mapped to HBM |
5 | Filter on requests to memory mapped to MMCFG space |
6 | Filter on requests to memory mapped to MMIO space |
7 | Match on Remote Node Target - Bit offsets 7 is XORed with bit offset 8. No filtering applied if both bits are either 0 or 1 |
8 | Match on Local Node Target - Bit offsets 8 is XORed with bit offset 7. No filtering applied if both bits are either 0 or 1 |
9 | Filter by Opcodes |
10 | Filter by PreMorphed Opcodes |
11-21 | Match on Opcode - 11b IDI Opcode w/top 2b 0x3 - Check IcelakeSP Uncore documentation |
22 | Just Match on Near Memory Cacheable Accesses - Bit offsets 22 is XORed with bit offset 23. No filtering applied if both bits are either 0 or 1 |
23 | Just Match on Non Near Memory Cacheable Accesses - Bit offsets 23 is XORed with bit offset 22. No filtering applied if both bits are either 0 or 1 |
24 | Match on Non-Coherent Requests |
25 | Match on ISOC Requests |
Counter name | Event name |
---|---|
WBOX0FIX | CORES_IN_C3 |
WBOX1FIX | CORES_IN_C6 |
WBOX2FIX | CORES_IN_P3 |
WBOX3FIX | CORES_IN_P6 |
The Intel® Icelake X microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the Ice Lake die, responsible for
distributing power to core/uncore components and thermal management. It runs in
firmware on an internal micro-controller and coordinates the socket’s power states.
Note: Power management is not completely centralized. Many units employ their own power
saving features. Events that provide information about those features are captured in
the PMON bocks of those units. For example, Intel® UPI Link Power saving states and
Memory CKE statistics are captured in the Intel® UPI Perfmon and IMC Perfmon
respectively.
The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
WBOX0 | * |
WBOX1 | * |
WBOX2 | * |
WBOX3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register | |
occ_edgedetect | N | Set bit 31 in config register | |
occ_invert | N | Set bit 30 in config register |
The Intel® Icelake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Ice Lake integrated Memory Controller provides the interface to DRAM and
communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, memory
access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.
The integrated Memory Controllers performance counters are exposed to the operating system through MMIO interfaces. Each memory controller provides a set of fixed counters.
Counter name | Event name |
---|---|
MBOX<0-7>FIX | MBOX_CLOCKTICKS |
The Intel® Icelake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Ice Lake integrated Memory Controller provides the interface to DRAM and
communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, memory
access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.
The integrated Memory Controllers performance counters are exposed to the operating system through MMIO interfaces. Icelake supports up to 8 channels of DDR4 with 2 channels per memory controller.
Counter name | Event name |
---|---|
MBOX<0-7>C0 | * |
MBOX<0-7>C1 | * |
MBOX<0-7>C2 | * |
MBOX<0-7>C3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
The Intel® Icelake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Ice Lake integrated Memory Controller provides the interface to DRAM and
communicates to the rest of the Uncore through the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, memory
access retry, memory scrubbing, thermal throttling, mirroring, and rank sparing.
Besides the general-purpose counters for each memory channel, Intel Icelake X provides free-running counters per memory controller.
Counter name | Event name |
---|---|
MDEV<0-3>C0 | DDR_READ_BYTES |
MDEV<0-3>C1 | DDR_WRITE_BYTES |
MDEV<0-3>C2 | PMM_READ_BYTES |
MDEV<0-3>C3 | PMM_WRITE_BYTES |
MDEV<0-3>C4 | IMC_DEV_CLOCKTICKS |
Mapping between MBOX<0-7>C<0-3> and MDEV<0-3>C<0-4>:
MDEV | MBOX |
---|---|
MDEV0 | MBOX<0-1> |
MDEV1 | MBOX<2-3> |
MDEV2 | MBOX<4-5> |
MDEV3 | MBOX<6-7> |
Counter name | Event name |
---|---|
QBOX<0-2>C0 | * |
QBOX<0-2>C1 | * |
QBOX<0-2>C2 | * |
QBOX<0-2>C3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
Counter name | Event name |
---|---|
SBOX<0-2>C0 | * |
SBOX<0-2>C1 | * |
SBOX<0-2>C2 | * |
SBOX<0-2>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
Counter name | Event name |
---|---|
TCBOX<0-5>C0 | * |
TCBOX<0-5>C1 | * |
TCBOX<0-5>C2 | * |
TCBOX<0-5>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 12 bit hex value | Set bits 24-35 in config register | |
mask0 | 8 bit hex mask | Channel mask filter, sets bits 36-43 in config register | Check Intel® Xeon® Processor Scalable Family Uncore Reference Manual for bit fields. |
Counter name | Event name |
---|---|
IBOX<0-5>PORT0 | IIO_BANDWIDTH_IN_PORT0 |
IBOX<0-5>PORT1 | IIO_BANDWIDTH_IN_PORT1 |
IBOX<0-5>PORT2 | IIO_BANDWIDTH_IN_PORT2 |
IBOX<0-5>PORT3 | IIO_BANDWIDTH_IN_PORT3 |
IBOX<0-5>PORT4 | IIO_BANDWIDTH_IN_PORT4 |
IBOX<0-5>PORT5 | IIO_BANDWIDTH_IN_PORT5 |
IBOX<0-5>PORT6 | IIO_BANDWIDTH_IN_PORT6 |
IBOX<0-5>PORT7 | IIO_BANDWIDTH_IN_PORT7 |
Counter name | Event name |
---|---|
IBOX<0-5>C0 | * |
IBOX<0-5>C1 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
Counter name | Event name |
---|---|
M2M<0-3>C0 | * |
M2M<0-3>C1 | * |
M2M<0-3>C2 | * |
M2M<0-3>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register |
Counter name | Event name |
---|---|
PBOX<0-5>C0 | * |
PBOX<0-5>C1 | * |
PBOX<0-5>C2 | * |
PBOX<0-5>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register |
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing