Skip to content

AddARMSupport

Thomas Gruber edited this page Apr 3, 2020 · 1 revision

Introduction

The ARM support in LIKWID is based on the perf_event Linux interface. This required less work than for x86 chips were also a non-perf_event backend is required. If your system does not provide any perf_event units (ls /sys/bus/event_source/devices/), you cannot use LIKWID on those machines.

Add hardware topology information

At first, LIKWID requires some IDs from the hardware to identify the platform. Although the hwloc library gathers most information, LIKWID additionally reads /proc/cpuinfo and reads the following fields:

  • CPU architecture: Probably 7 for ARMv7 or 8 for ARMv8
  • CPU implementer: Differentiate the chips by implementors like Marvell, NVIDIA, ...
  • CPU part: Differentiate the chips of a vendor
  • CPU variant: Rarely used but sometimes reflects number of cores, size of L3 cache or similar
  • CPU revision: Just for completeness

When you have these IDs at hand (in hexadecimal format), you add them to src/includes/topology.h. Here is a snippet of this file:

#define  ARMV7_FAMILY     0x7U
#define  ARMV8_FAMILY     0x8U

/* ARM vendors */
#define DEFAULT_ARM	0x41U
#define NVIDIA_ARM	0x4EU
[...]

/* ARM */
#define  ARM_CORTEX_A57     0xD07U
#define  CAV_THUNDERX	0x0A0U
[...]

The IDs in FAMILY correspond to CPU architecture, the ARM vendors IDs to CPU implementer and ARM IDs to CPU part. Please use reasonable abbrevations.

With this information, the chip can be identified but LIKWID adds some more data for the user like chip architecture description and a short name. The description is only used for output but the short name is used later in the performance monitoring part. Both information has to be added to src/topology.c. Snippet:

[...]
static char* cavium_thunderx_str = "Cavium Thunder X (ARMv8)";
static char* arm_cortex_a57 = "ARM Cortex A57 (ARMv8)";
static char* arm_cortex_a53 = "ARM Cortex A53 (ARMv8)";
[...]
static char* short_arm8 = "arm8";
static char* short_arm8_cav_tx2 = "arm8_tx2";
static char* short_arm8_cav_tx = "arm8_tx";
[...]

So add a nice string here and a short name. If the vendor publishes a short name for the chip, please use them. Intel provides long and short names like Intel Cascadelake X and CLX. I failed at using them, so please, be better than me ;)

The file src/topology.c contains a function topology_setName() which contains a set of switch-case statements based on the IDs we have added to src/includes/topology.hbefore. Search for ARMV7 or ARMV8 because the function is quite long. There you add the description and short name for the new chip. Here is a snippet of it:

switch ( cpuid_info.family )
{
    case ARMV8_FAMILY:
        switch (cpuid_info.vendor)
        {
            case DEFAULT_ARM:
                switch (cpuid_info.part)
                {
                    case ARM_CORTEX_A57:
                        cpuid_info.name = arm_cortex_a57;
                        cpuid_info.short_name = short_arm8;
                        break;
                    [...]
                }
                break;

            [...]
        }
        break;
    [...]
}

With these settings, you should be able to run likwid-topology and get proper output (except cache information). Currently, there is no interface where cache information can be gathered from on ARM platforms.

Adding support for performance monitoring

In order to get performance monitoring support for your chip, three new files are required:

  • perfmon_<name>.h: Main header for the chip
  • perfmon_<name>_counters.h: Counter/register definitions
  • perfmon_<name>_events.txt: Event definitions

No underscores or similar allowed in name but please name them reasonably to find it again later.

Counter/register definitions

The first file should be perfmon_<name>_counters.h. It commonly consists of 3 tables: list of counters, list of units (set of counters using the same device) and a table which maps LIKWID types to the perf_event units. The first entry in the following tables is the template, the second line an example:

  • List of counters:
#define NUM_COUNTERS_<UPPERCASE_NAME> X

static RegisterMap <name>_counter_map[NUM_COUNTERS_<UPPERCASE_NAME>] = {
    {COUNTERNAME, UNIQUE_ID, UNIT, CONFIG_REG, COUNTER_REG1, COUNTER_REG2, DEVICE_ID, OPTION_MASK, }
    {"PMC0", PMC0, PMC, 0x0, 0x0, 0, 0, 0x0},  // for ARM only the COUNTERNAME, UNIQUE_ID and UNIT are of interest
};
  • List of units (Units are defined in src/includes/register_types.h but adding new types it not recommended):
static BoxMap <name>_box_map[NUM_UNITS] = {
    [UNITNAME] = {CONTROL_REG, STATUS_REG, CLEAR_REG, STATUS_REG_OFFSET, IS_PCI, DEVICE_ID, COUNTER_WIDTH}
    [PMC] = {0, 0, 0, 0, 0, 0, 48}, // for ARM only the COUNTER_WIDTH is of interest
};
  • Translation map:
static char* <name>_translate_types[NUM_UNITS] = {
    [UNITNAME] = "path_to_perf_event_directory_containing_the_'type'_file_and_'format'_folder", 
    [PMC] = "/sys/bus/event_source/devices/cpu",
};

Event definitions

The most tedious work when adding a new chip is typing down/copying/parsing the list of supported events. But you are lucky, you want to add an ARM chip and these chips provide a common set of events. The list of events is a plain text file and transformed into a header during compilation.

The format for the events is fixed:

EVENT_<EVENTNAME> <EVENT_ID> <USABLE_COUNTERS>
UMASK_<EVENTNAME> <UMASK>

An example for ARM platforms for the event INST_RETIRED:

EVENT_INST_RETIRED 0x08 PMC
UMASK_INST_RETIRED 0x00

The <USABLE_COUNTERS> is compared to the counter names and only the beginning has to match, so PMC matches for PMC0, PMC1, ... It depends how you named the counters in perfmon_<name>_counters.h's list of counters.

Main header file perfmon_<name>.h

This file is quite simple for ARM platforms as it contains only four lines:

#include <perfmon_<name>_events.h>
#include <perfmon_<name>_counters.h>
static int perfmon_numCounters<UPPERCASE_NAME> = NUM_COUNTERS_<UPPERCASE_NAME>;
static int perfmon_numArchEvents<UPPERCASE_NAME> = NUM_ARCH_EVENTS_<UPPERCASE_NAME>;

Registering chip in performance monitoring module

The performance module is defined in src/perfmon.c.

At first, add the main header: #include <perfmon_<name>.h>. The next step is comparable to the topology_setName() function. The name of the function is perfmon_init_maps() and also contains a set of nested switch-case statements. Search for ARMV7 or ARMV8 as the function is quite long. Here we register for a CPU family, vendor and part the lists/tables we have defined before in perfmon_<name>_counters.h and perfmon_<name>_counters.h.

Here is a snippet:

switch ( cpuid_info.family )
{
    [...]
    case ARMV8_FAMILY:
        switch ( cpuid_info.vendor)
        {
            case DEFAULT_ARM:
                switch (cpuid_info.part)
                {
                    case ARM_CORTEX_A57:  // the define in src/includes/topology.h
                        eventHash = a57_arch_events; // <name>_arch_events generated at compilation
                        perfmon_numArchEvents = perfmon_numArchEventsA57; // defined by you in perfmon_<name>.h
                        perfmon_numCounters = perfmon_numCountersA57; // defined by you in perfmon_<name>.h
                        counter_map = a57_counter_map; // <name>_counter_map defined in perfmon_<name>_counters.h
                        box_map = a57_box_map; // <name>_box_map defined in perfmon_<name>_counters.h
                        translate_types = a57_translate_types; // <name>_translate_types defined in perfmon_<name>_counters.h
                        break;
                    [...]
               }
           [...]
        }
    [...]
}

Finalization

Internal list of supported chip architectures

As a final step, add your chip to the print_supportedCPUs() function in src/topology.c

Add it to README.md

Add the new chip to README.md in section https://github.com/RRZE-HPC/likwid/blob/master/README.md#supported-architectures

Create counter tables in wiki

The LIKWID wiki contains one page per supported archictecture with tables of available counters, restrictions and further information. Unfortunately, I had to use HTML tables instead of Markdown tables. Copy one already existing ARM architecture file to get the structure and add all information.

Clone this wiki locally