RISC-V Vector on Valgrind小结

Posted by Fei Wu on April 12, 2024

现状

截至目前(2024/04/12),rvv on valgrind的状态

  • 支持nulgrind和memcheck两种tool
  • 支持除了floating-piont和fixed-point之外的所有rvv指令
  • 可以用来跑autovectorized coremark等应用
  • 部分rvv memcheck的逻辑有改进的地方
  • 如有需要,可以支持完整rvv指令,即使有些地方是不完美的

最新代码库在

  • repo - https://github.com/intel/valgrind-rvv
  • branch - poc-rvv-remove-vl-from-ir

Valgrind背景知识

实现逻辑

  • valgrind有一套中间表达IR
  • guest code和instrumentation code比如memcheck都会先用IR表达,然后IR最后会翻译成host指令。

valgrind flow

总共会经历这么几个步骤,下图来源于[1] vex

代码逻辑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
LibvEX_Translate
    irsb = LibvEx_FrontEnd(vta, &res, &pxControl);
    disInstrFn = RISCV64FN(disInstr_RISCV64);
    irsb = bb_to_IR(vta->guest_extents, disInstrFn, ...);
        switch (INSN(1, 0))
            case 0b11:
                dres->len = inst_size = 4;
                ok = dis_RISCV64_standard(dres, irsb, insn, ...);
    irsb = do_iropt_BB(irsb, specHelper, preciseMemExnsFn, *pxControl, ...);
    irsb = vta->instrument1(vta->callback_opaque, irsb);
    irsb = vta->instrument2(vta->callback_opaque, irsb);
    irsb = cprop_BB(irsb);
    Libvex_BackEnd(vta, &res, irsb, pxControl);
        switch (vta->arch_host) {
            case VexArchRISCV64:
                iselSB = RISCV64FN(iselSB_RISCV64);
                emit = CAST_TO_TYPEOF(emit)RISCV64FN(emit_RISCV64Instr);

        iselSB
            for (i = 0; i < bb->stmts_used; i++)
                iselstmt(env, bb->stmts[i]);
                switch (stmt->tag)
                    case Ist_store:
                        if (tyd == Ity_I64) addInstr(env, RISCV64Instr_Store(RISCV64op_SD, src, addr, 0));
                        RISCV64Instr* i = LibVEX_Alloc_inline(sizeof(RISCV64Instr));
                        i->tag = RISCV64in_Store;
                        i->RISCV64in.Store.op = op;
        for (i = 0; i < rcode->arr_used; i++)
            emit
                switch (i->tag) {
                    case RISCV64in_MV:
                        Int dst = iregEnc(i->RISCV64in.MV.dst)
                        UInt src = iregEnc(i->RISCV64in.MV.src)
                        p = emit_CR(p, 0b10, src, dst, 0b1000);
                            Ushort the_insn = 0;
                            the_insn |= opcode << 0;
                            the_insn |= rs2 << 2;
                            the_insn |= rd << 7;
                            the_insn |= funct4 << 12;
                            return emit16(p, the_insn);

Memcheck逻辑

  • Valid-value (V) bits

In short, each bit in the system has (conceptually) an associated V bit, which follows it around everywhere, even inside the CPU. Yes, all the CPU’s registers (integer, floating point, vector and condition registers) have their own V bit vectors.

  • Valid-address (A) bits

all bytes in memory, but not in the CPU, have an associated valid-address (A) bit. This indicates whether or not the program can legitimately read or write that location.

instrument

RVV支持

增加普通指令

在valgrind里面增加指令一般有如下方法, 很明显前面的更好。

  • existing lops (IR)
  • creating a new lop
  • a clean helper
  • a dirty helper

增加RVV指令

首先已有的ir是支持不了rvv的,所以退而求其次,只能选择new ir。不到非不得已不会选择helper实现,helper会导致instrumentation不好做。

实现难点

  • rvv是第一个在valgrind支持的variable length的ISA,没有参考实现
  • valgrind默认ir都是固定大小,但是对于rvv却不是,这些新isa的加入打破了原来valgrind的一些假设
  • 因为rvv引入了大量新的ir,这些ir的memcheck逻辑都需要重写,本身memcheck针对scalar的ir逻辑就比较复杂,对于部分vector指令就更加复杂
  • rvv有lmul等概念,从而寄存器(组)的大小是可变的,导致后端的寄存器分配变得复杂
  • rvv指令很多,实现工作量大,目前在后端通过一种机制来尽量复用qemu的代码,同时也解决上面寄存器分配的问题,虽然理论性能会有所下降
  • 社区想使用统一的ir给rvv以及arm sve共享,又增加了复杂度,是不是必须这样做我持保留态度。valgrind mailing list有相关讨论,也是目前block的主要原因
  • vector load/store的实现,如果拆成scalar一个个是操作,vlen太长的话会导致生成的ir过长从而破外valgrind原来的假设都需要处理,如果不拆的话怎么保证memcheck的逻辑
  • 还有一些实现的细节也需要慢慢改进,比如struct VexGuestRISCV64State的大小约束了vlen的长度,虽然不是大问题,但都需要一个个解决

引用