【ebpf】记一次BCC框架字符串传递用户空间遇'\0'截断问题

枫MapleLCG · 发表于 2024-3-17 14:11

BCC框架下的字符串拷贝

ebpf技术允许用户将程序加载到内核中获取数据。为了安全，linux要求只能使用bpf辅助函数来拷贝字符串。而BCC框架提供的接口为：bpf_probe_read_kernel, bpf_probe_read_kernel_str。我们使用"bpf_probe_read_kernel"函数将“wget\0baidu.com"拷贝到结构体的char数组

struct data_t{
  char a[64];
}

我发现，我是可以将“wget\0baidu.com"完整拷贝到char a[64[里的。但结构体通过perf_submit传递到python时，会发现只能接收到wget，而'\0'之后的内容被丢弃了。这并不是我们想看到的结果

并且此时的数据转换为了python中的bytes类型

问题定位

来到bcc的github仓库，从源码找找是哪里的问题。

先搜索perf_submit，来到src/cc/frontends/clang/b_frontend_action.cc的982行

        } else if (memb_name == "perf_submit") {
          string name = string(Ref->getDecl()->getName());
          string arg0 = rewriter_.getRewrittenText(expansionRange(Call->getArg(0)->getSourceRange()));
          string args_other = rewriter_.getRewrittenText(expansionRange(SourceRange(GET_BEGINLOC(Call->getArg(1)),
                                                           GET_ENDLOC(Call->getArg(2)))));
          txt = "bpf_perf_event_output(" + arg0 + ", (void *)bpf_pseudo_fd(1, " + fd + ")";
          txt += ", CUR_CPU_IDENTIFIER, " + args_other + ")";

          // e.g.
          // struct data_t { u32 pid; }; data_t data;
          // events.perf_submit(ctx, &data, sizeof(data));
          // ...
          //                       &data   ->     data    ->  typeof(data)        ->   data_t
          auto type_arg1 = Call->getArg(1)->IgnoreCasts()->getType().getTypePtr()->getPointeeType().getTypePtrOrNull();
          if (type_arg1 && type_arg1->isStructureType()) {
            auto event_type = type_arg1->getAsTagDecl();
            const auto *r = dyn_cast<RecordDecl>(event_type);
            std::vector<std::string> perf_event;

            for (auto it = r->field_begin(); it != r->field_end(); ++it) {
              // After LLVM commit aee49255074f
              // (https://github.com/llvm/llvm-project/commit/aee49255074fd4ef38d97e6e70cbfbf2f9fd0fa7)
              // array type change from `comm#char [16]` to `comm#char[16]`
              perf_event.push_back(it->getNameAsString() + "#" + it->getType().getAsString()); //"pid#u32"
            }
            fe_.perf_events_[name] = perf_event;
          }

BCC提供的接口，本质上是根据用户提供的参数，来拼接并调用bpf原生辅助函数。并且做一些c语言与python之间数据交换的准备，将"char a[16]” 转换为"a#char[16]"

接下来我们找跟数据交换有关的地方，逐一排查即可定位到问题源头。我们从内核往用户空间提交数据的时候，会用到BPF_TABLE。BCC将跟TABLE有关的操作做了python上的封装，所以我们只需要去到跟table有关的地方就可以继续了。我们来到源码仓库的src/python/bcc/table.py，这是BCC封装TABLE的地方，我们看到214行有一个函数

import ctypes as ct
def _get_event_class(event_map):
    ct_mapping = {
        'char'              : ct.c_char,
        's8'                : ct.c_char,
        'unsigned char'     : ct.c_ubyte,
        'u8'                : ct.c_ubyte,
        'u8 *'              : ct.c_char_p,
        'char *'            : ct.c_char_p,
        'short'             : ct.c_short,
        's16'               : ct.c_short,
        'unsigned short'    : ct.c_ushort,
        'u16'               : ct.c_ushort,
        'int'               : ct.c_int,
        's32'               : ct.c_int,
        'enum'              : ct.c_int,
        'unsigned int'      : ct.c_uint,
        'u32'               : ct.c_uint,
        'long'              : ct.c_long,
        'unsigned long'     : ct.c_ulong,
        'long long'         : ct.c_longlong,
        's64'               : ct.c_longlong,
        'unsigned long long': ct.c_ulonglong,
        'u64'               : ct.c_ulonglong,
        '__int128'          : (ct.c_longlong * 2),
        'unsigned __int128' : (ct.c_ulonglong * 2),
        'void *'            : ct.c_void_p,
    }

    # handle array types e.g. "int [16]", "char[16]" or "unsigned char[16]"
    array_type = re.compile(r"(\S+(?: \S+)*) ?\[([0-9]+)\]$")

    fields = []
    num_fields = lib.bpf_perf_event_fields(event_map.bpf.module, event_map._name)
    i = 0
    while i < num_fields:
        field = lib.bpf_perf_event_field(event_map.bpf.module, event_map._name, i).decode()
        m = re.match(r"(.*)#(.*)", field)
        field_name = m.group(1)
        field_type = m.group(2)

        if re.match(r"enum .*", field_type):
            field_type = "enum"

        m = array_type.match(field_type)
        try:
            if m:
                fields.append((field_name, ct_mapping[m.group(1)] * int(m.group(2))))
            else:
                fields.append((field_name, ct_mapping[field_type]))
        except KeyError:
            # Using print+sys.exit instead of raising exceptions,
            # because exceptions are caught by the caller.
            print("Type: '%s' not recognized. Please define the data with ctypes manually."
                  % field_type, file=sys.stderr)
            sys.exit(1)
        i += 1
    return type('', (ct.Structure,), {'_fields_': fields})

这个函数做了一个事情，将"name#int[16]"、"name#char[16]"等按照一定规则正则匹配，拆开为“名字+类型”两个group。并根据ct_mapping将类型映射为python能够接受的ctypes格式，将名字和ctypes格式添加到ctypes定义的结构体中。

根据以上两个文件的源码内容，我们得知，BCC是使用ctypes库在python和c语言之间交换和转换数据的。在经过尝试后，发现ctypes要把结构体中的char数组传递给python时，会将char数组映射为python的bytes类。可能是为了避免冗余，即便数据实际长度远小于定义的char数组长度，ctypes在转换的过程中会将“\0"之后的内容尽数抛弃。

解决方案

解决办法很简单，使用c_ubyte，将数据类型映射为c_ubyte_Array即可解决这个问题。即，将char a[64] 改为 unsigned char a[64]。

不过输出的时候没有办法直接print，需要一个循环。改为unsigned char后，你定义了多大的空间，ctypes就会相应的接收多大的空间。并且在输出的时候，末尾添加一个“S+数字"来表明该数组占用的空间

解决原理

c_char对应的是python的one-charactor bytes object，所以会将“\0"视为空字符。

而c_ubyte 或者是c_byte对应的是python的int，空字符会被解读为数字0。

mmSmm · 发表于 2024-3-17 14:40

感谢楼主

ayahoostar · 发表于 2024-3-17 14:52

感谢咯，支持下！

soglog · 发表于 2024-3-17 15:07

谢谢·学习了················感谢给出这样的好思路

stu · 发表于 2024-3-20 21:26

感谢楼主，学习一下

debug_cat · 发表于 2024-4-29 10:43

整个排查思路分析非常细，感谢

帐号		自动登录	找回密码
密码			注册[Register]

[其他原创] 【ebpf】记一次BCC框架字符串传递用户空间遇'\0'截断问题

BCC框架下的字符串拷贝

问题定位

解决方案

解决原理

免费评分