Tracepoint syscalls

Posted by Fei Wu on January 16, 2024

起因

在试用riscv libbpf-bootstrap的时候,发现/sys/kernel/debug/tracing/events/syscalls不存在,导致相应工具不能使用。

代码

  • syscalls tracepoint的定义
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#define SYSCALL_TRACE_EXIT_EVENT(sname)                                 \
        static struct syscall_metadata __syscall_meta_##sname;          \
        static struct trace_event_call __used                           \
          event_exit_##sname = {                                        \
                .class                  = &event_class_syscall_exit,    \
                {                                                       \
                        .name                   = "sys_exit"#sname,     \
                },                                                      \
                .event.funcs            = &exit_syscall_print_funcs,    \
                .data                   = (void *)&__syscall_meta_##sname,\
                .flags                  = TRACE_EVENT_FL_CAP_ANY,       \
        };                                                              \
        static struct trace_event_call __used                           \                                                                                                                                                     __section("_ftrace_events")                                   \
        *__event_exit_##sname = &event_exit_##sname;

#define SYSCALL_METADATA(sname, nb, ...)                        \
        static const char *types_##sname[] = {                  \
                __MAP(nb,__SC_STR_TDECL,__VA_ARGS__)            \
        };                                                      \
        static const char *args_##sname[] = {                   \
                __MAP(nb,__SC_STR_ADECL,__VA_ARGS__)            \
        };                                                      \
        SYSCALL_TRACE_ENTER_EVENT(sname);                       \
        SYSCALL_TRACE_EXIT_EVENT(sname);                        \
        static struct syscall_metadata __used                   \
          __syscall_meta_##sname = {                            \
                .name           = "sys"#sname,                  \
                .syscall_nr     = -1,   /* Filled in at boot */ \
                .nb_args        = nb,                           \
                .types          = nb ? types_##sname : NULL,    \
                .args           = nb ? args_##sname : NULL,     \
                .enter_event    = &event_enter_##sname,         \
                .exit_event     = &event_exit_##sname,          \
                .enter_fields   = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
        };                                                      \
        static struct syscall_metadata __used                   \
          __section("__syscalls_metadata")                      \
         *__p_syscall_meta_##sname = &__syscall_meta_##sname;
  • syscalls tracepoint实例化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static __init struct syscall_metadata *
find_syscall_meta(unsigned long syscall)
{
        struct syscall_metadata **start;
        struct syscall_metadata **stop; 
        char str[KSYM_SYMBOL_LEN];


        start = __start_syscalls_metadata;
        stop = __stop_syscalls_metadata;
        kallsyms_lookup(syscall, NULL, NULL, NULL, str);

        if (arch_syscall_match_sym_name(str, "sys_ni_syscall"))
                return NULL;

        for ( ; start < stop; start++) {
                if ((*start)->name && arch_syscall_match_sym_name(str, (*start)->name))
                        return *start;
        }
        return NULL;
}
  • broken的原因
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
commit 08d0ce30e0e4fcb5f06c90fe40387b1ce9324833
Author: Sami Tolvanen <samitolvanen@google.com>
Date:   Mon Jul 10 18:35:46 2023 +0000

    riscv: Implement syscall wrappers
    
    Commit f0bddf50586d ("riscv: entry: Convert to generic entry") moved
    syscall handling to C code, which exposed function pointer type
    mismatches that trip fine-grained forward-edge Control-Flow Integrity
    (CFI) checks as syscall handlers are all called through the same
    syscall_t pointer type. To fix the type mismatches, implement pt_regs
    based syscall wrappers similarly to x86 and arm64.
    
    This patch is based on arm64 syscall wrappers added in commit
    4378a7d4be30 ("arm64: implement syscall wrappers"), where the main goal
    was to minimize the risk of userspace-controlled values being used
    under speculation. This may be a concern for riscv in future as well.
    
    Following other architectures, the syscall wrappers generate three
    functions for each syscall; __riscv_<compat_>sys_<name> takes a pt_regs
    pointer and extracts arguments from registers, __se_<compat_>sys_<name>
    is a sign-extension wrapper that casts the long arguments to the
    correct types for the real syscall implementation, which is named
    __do_<compat_>sys_<name>.

  • fix - 已经有人upstream了
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
commit a87e7d3e8832271ecb7d5eaaabc5b49fe25a469b
Author: Alexandre Ghiti <alexghiti@rivosinc.com>
Date:   Tue Oct 3 20:24:07 2023 +0200

    riscv: Fix ftrace syscall handling which are now prefixed with __riscv_
    
    ftrace creates entries for each syscall in the tracefs but has failed
    since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers") which
    prefixes all riscv syscalls with __riscv_.
    
    So fix this by implementing arch_syscall_match_sym_name() which allows us
    to ignore this prefix.
    
    And also ignore compat syscalls like x86/arm64 by implementing
    arch_trace_is_compat_syscall().
1
2
3
4
5
6
7
8
9
10
11
#define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
static inline bool arch_syscall_match_sym_name(const char *sym,
                                               const char *name)
{
        /*
         * Since all syscall functions have __riscv_ prefix, we must skip it.
         * However, as we described above, we decided to ignore compat
         * syscalls, so we don't care about __riscv_compat_ prefix here.
         */
        return !strcmp(sym + 8, name);
}

总结

  • 如果以upstream patch为目标,需要有相应的CI能及时发现问题,并且最好针对相应的开发branch
  • 如果主要目的是能使用该工具,可以尝试更新的内核版本