Perf Symbolic Events及RISC-V PMU实现

Posted by Fei Wu on February 1, 2024

Perf list的输出

在AMD Ryzen5 5600H上perf list有如下输出,因为events太多这里只复制一小部分

1
2
3
4
5
6
7
8
9
10
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]

  amd_iommu_0/ign_rd_wr_mmio_1ff8h/                  [Kernel PMU event]
  amd_iommu_0/int_dte_hit/                           [Kernel PMU event]

  bp_de_redirect
       [Decode Redirects]
  bp_dyn_ind_pred
       [Dynamic Indirect Predictions]

x86的实现

如果跟踪进去,会发现这里输出的event list (有名字的) 来源于3个地方:

  1. perf-tools主要调用perf_event_open() probe的,如果支持系统调用返回成功

    1
    
     perf_event_open({type=PERF_TYPE_HARDWARE, ..., config=PERF_COUNT_HW_BRANCH_INSTRUCTIONS, ...);
    
  2. perf-tools从/sys/bus/event_source/devices里面找到的,具体可以看perf_event_open的手册dynamic PMU

    1
    2
    
     $ ls amd_iommu_0/events/ign_rd_wr_mmio_1ff8h
     amd_iommu_0/events/ign_rd_wr_mmio_1ff8h
    
    1
    
     AMD_IOMMU_EVENT_DESC(ign_rd_wr_mmio_1ff8h,    "csource=0x14"),
    
  3. 还有一个perf-tools源码里面写的,perf执行的时候会根据不同的cpuid找到tools/perf/pmu-events对应的配置文件

    1
    2
    3
    4
    5
    
     $ grep -Inr bp_dyn_ind_pred .
     ./arch/x86/amdzen1/branch.json:13:    "EventName": "bp_dyn_ind_pred",
     ./arch/x86/amdzen2/branch.json:13:    "EventName": "bp_dyn_ind_pred",
     ./arch/x86/amdzen4/branch.json:8:    "EventName": "bp_dyn_ind_pred",
     ./arch/x86/amdzen3/branch.json:13:    "EventName": "bp_dyn_ind_pred",
    
    1
    2
    3
    4
    5
    
       {
         "EventName": "bp_dyn_ind_pred",
         "EventCode": "0x8e",
         "BriefDescription": "Dynamic indirect predictions (branch used the indirect predictor to make a prediction)."
       },
    

我们分别来看下这3种情况是怎么将symbolic event name对应到硬件event的编码。

对于第1种情况,event编码存在与kernel,perf-tools不需要知道,比如

1
2
3
4
5
6
7
8
9
10
static const u64 amd_zen2_perfmon_event_map[PERF_COUNT_HW_MAX] =
{
        [PERF_COUNT_HW_CPU_CYCLES]              = 0x0076,
        [PERF_COUNT_HW_INSTRUCTIONS]            = 0x00c0,
        [PERF_COUNT_HW_CACHE_REFERENCES]        = 0xff60,
        [PERF_COUNT_HW_CACHE_MISSES]            = 0x0964,
        [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]     = 0x00c2,
        [PERF_COUNT_HW_BRANCH_MISSES]           = 0x00c3,
        [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00a9,
};

对于第2种情况,perf-tools可以从sysfs读出并传递给perf_event_open,内核并不需要在perf执行的时候进行转化

1
perf_event_open({type=0xc /* PERF_TYPE_??? */, ..., config=0x14, ...);
1
2
3
4
$ cat amd_iommu_0/type 
12
$ cat amd_iommu_0/events/ign_rd_wr_mmio_1ff8h 
csource=0x14

对于第3种情况,perf-tools可以使用hardcode在json文件里面的编码,从而通过raw type来使用perf_event_open

1
perf_event_open({type=PERF_TYPE_RAW, ..., config=0x8e, ...);

RISC-V的实现

对于上面的第2和3种情况,具体是x86还是riscv并无区别,我们只需要关注第1种情况。

1
2
3
4
5
6
static const struct sbi_pmu_event_data pmu_hw_event_map[] = {
        [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]     = {.hw_gen_event = {
             SBI_PMU_HW_BRANCH_INSTRUCTIONS,
             SBI_PMU_EVENT_TYPE_HW, 0}},
        ...
};

可以看到这里并没有直接给出event的编码,而是诉诸于sbi来处理,这里SBI_PMU_HW_BRANCH_INSTRUCTIONS和SBI_PMU_EVENT_TYPE_HW是kernel和sbi定义的接口。

那么sbi是怎么获得这个编码的呢?对于通用的sbi实现来说,不可能hardcode所有riscv cpu的编码,所以sbi需要dtb配置文件来告诉它这种映射关系。SBI PMU Device Tree Bindings

riscv,event-to-mhpmevent(Optional) - It represents an ONE-to-ONE mapping between a PMU event and the event selector value that platform expects to be written to the MHPMEVENTx CSR for that event. The mapping is encoded in a table format where each row represents an event. The first column represent the event idx where the 2nd & 3rd column represent the event selector value that should be encoded in the expected value to be written in MHPMEVENTx. This property shouldn’t encode any raw hardware event.

sifive dtb

引用