bpf_helpers and you...

Published 2020-04-12 on Farid Zakaria's Blog

This is a republishing of a e-mail I sent out to xdp-newbies mailing list which you can find here originally published on Wed, 30 Oct 2019

This is my attempt of a continuation of David Miller’s prior e-mail bpf.h and you….

I was curious about how ebpf filters are wired and work. The heavy use of C macros makes the source code difficult for me to comprehend.

Maybe there’s an online pre-processed version of the Linux kernel?

I’m hoping others may find this exploratory-dive insightful – hopefully, it’s accurate enough.

Let’s write a very trivial ebpf filter hello_world_kern.c and have it print “hello world”

#include <linux/bpf.h>

#define __section(NAME) __attribute__((section(NAME), used))

static char _license[] __section("license") = "GPL";

/* helper functions called from eBPF programs written in C */
static int (*bpf_trace_printk)(const char *fmt, int fmt_size,
                            ...) = (void *)BPF_FUNC_trace_printk;

__section("hello_world") int hello_world_filter(struct __sk_buff *skb) {
    char msg[] = "hello world";
    bpf_debug_printk(msg, sizeof(msg));
    return 0;

If we compile the above using the below, we can inspect the LLVM IR.

clang -c -o hello_world_kern.ll -x c -S -emit-llvm hello_world_kern.c

The few lines that stand out are:

@bpf_trace_printk = internal global i32 (i8*, i32, ...)* inttoptr (i64 6 to i32 (i8*, i32, ...)*), align 8
%6 = load i32 (i8*, i32, ...)*, i32 (i8*, i32, ...)** @bpf_trace_printk, align 8
%7 = getelementptr inbounds [13 x i8], [13 x i8]* %3, i32 0, i32 0
%8 = call i32 (i8*, i32, ...) %6(i8* %7, i32 13)

The above demonstrates that the value of BPF_FUNC_trace_printk is simply the integer i64 6 and it is being cast to the function pointer (i32 (i8*, i32, ...)*))

Sure enough, we can confirm that bpf_trace_printk is the 6th value in the enumeration of known bpf bpf_helpers in bpf.h

#define __BPF_FUNC_MAPPER(FN)       \
    FN(unspec),                     \
    FN(map_lookup_elem),            \
    FN(map_update_elem),            \
    FN(map_delete_elem),            \
    FN(probe_read),                 \
    FN(ktime_get_ns),               \
    FN(trace_printk),               \

We can go even further and take this LLVM IR and generate human-readable eBPF assembly using llc

llc hello_world_kern.ll -march=bpf

Depending on the optimization level of the earlier clang call you may see different results however using -O3 we can see

call 6

Great! So we know that the call to bpf_trace_printk gets translated into a call instruction with an immediate value of 6.

How does it end up calling code within the kernel, though? Once the Verifier verifies the bytecode it calls fixup_bpf_calls which goes through all the instructions and makes the necessary adjustment to the immediate value

fixup_bpf_calls(...) {
        fn = env->ops->get_func_proto(insn->imm, env->prog);
        /* all functions that have prototype and verifier allowed
        * programs to call them, must be real in-kernel functions
        if (!fn->func) {
                "kernel subsystem misconfigured func %s#%d\n",
                func_id_name(insn->imm), insn->imm);
            return -EFAULT;
        insn->imm = fn->func - __bpf_call_base;

N.B. I haven’t deciphered how __bpf_call_base is used / works

The get_func_proto will return the function prototypes registered by every subsystem, such as in net/core/filter.c. At this point in the method, it’s a simple switch statement to get the matching function prototype given the numeric value.

I’d love to see more on the code path of how the non-JIT vs JIT instructions get handled.

For instance, for the net subsystem, I can see where the ebpf prog is invoked. Still, it’s challenging to work out how the choice of executing the function directly (in the case of JIT) vs. running it through the interpreter is handled.

eBPF is impressive, but the toolchain requires some technical depth of the Linux kernel. If you have similar posts of explaining eBPF from the ground up, share it with me.