申请ID:wadahana【申请通过】
1、申 请 I D :wadahana2、个人邮箱:wadahana@gmail.com
3、原创技术文章:《Android bionic 代码分析(1) - linker 自举》做定向流量sdk时研究elfhook和android动态库加载过程做的笔记和总结。
bionic linker代码分析(1) - linker自举
Android在启动一个新的进程的时候,调用execv函数族trap到内核,由kernel去检查和加载可执行文件;kernel做完可执行文件的加载的同时会加载/system/bin/linker,然后由linker去加载依赖的动态库,并调用可执行文件的入口函数,完成控制权的转移。linker本身也是一个ELF格式的动态库文件,它的入口代码位于${bionic}/linker/arch/arm/begin.S文件中
#include <private/bionic_asm.h>
ENTRY(_start)
mov r0, sp
bl __linker_init
/* linker init returns the _entry address in the main image */
bx r0
END(_start)
在_start函数中,栈顶指针寄存器被赋值给r0寄存器作为参数调用__linker_init。__linker_init做完linker的初始化和依赖库的加载后,通过r0返回了可执行文件入口函数,接下来的bx r0 指令就会将控制权移交给可执行文件。__linker_init() 位于${bionic}/linker/linker_main.cpp文件中:
492 extern "C" ElfW(Addr) __linker_init(void* raw_args) {
493 KernelArgumentBlock args(raw_args);
494
495 // AT_BASE is set to 0 in the case when linker is run by iself
496 // so in order to link the linker it needs to calcuate AT_BASE
497 // using information at hand. The trick below takes advantage
498 // of the fact that the value of linktime_addr before relocations
499 // are run is an offset and this can be used to calculate AT_BASE.
500 static uintptr_t linktime_addr = reinterpret_cast<uintptr_t>(&linktime_addr);
501 ElfW(Addr) linker_addr = reinterpret_cast<uintptr_t>(&linktime_addr) - linktime_addr;
493行的KernelArgumentBlock类在 ${bionic}/libc/private/KernelArgumentBlock.h文件中定义。kernel在加载linker时,已经在堆栈中初始化好了命令行参数、环境变量以及后面的ELF辅助向量(Auxiliary Vector)。raw_args即_start中传入的栈顶指针寄存器,通过args对象只是对上述信息进行封装,提供一系列读写接口而已。堆栈的内存布局如下:
position content size (bytes)comment
------------------------------------------------------------------------
stack pointer ->[ argc = number of args ] 4
[ argv (pointer) ] 4
[ argv (pointer) ] 4
[ argv[..] (pointer) ] 4 * n
[ argv (pointer) ] 4
[ argv (pointer) ] 4 = NULL
[ envp (pointer) ] 4
[ envp (pointer) ] 4
[ envp[..] (pointer) ] 4
[ envp (pointer) ] 4 = NULL
[ auxv (Elf32_auxv_t) ] 8
[ auxv (Elf32_auxv_t) ] 8
[ auxv[..] (Elf32_auxv_t) ] 8
[ auxv (Elf32_auxv_t) ] 8 = AT_NULL vector
[ padding ] 0 - 16
[ argument ASCIIZ strings ] >= 0
[ environment ASCIIZ str. ] >= 0
(0xbffffffc) [ end marker ] 4 = NULL 结束
(0xc0000000) < bottom of stack > 0 (virtual)
500行定义的linker_addr变量就是linker文件在内存中实际映射的基地址,在Android 7之前,linker_addr是通过是直接从ELF辅助向量中读取AT_BASE获得。Android 8之后,通过定义静态变量linktime_addr是来计算linker_addr。这里通过对linker反汇编,来理解这两行代码的trick,是如何计算linker_addr:
先找到__linker_init函数的实现
0xf57ed的指令r2 = r2(0x78c2e) + pc(linker_addr + 0xf582) = linker_addr + 0x881B0; 因为实际在内存运行的时候指令寄存器pc的值是基地址+偏移地址,所以实际当前的pc = linker_addr + 0xf582,armv7三级流水线pc等于取指地址0xf582。
接来下的指令ldr r3, 也就是将linker_addr+0x881B0中的内容赋值给r3,图2中0x881B0地址的值等于0x881B0,所以r3的值等于0x881B0。
然后在地址0xf588的指令,r5 = r2(linker_addr+0x881B0) - r3(0x881B0) = linker_addr
https://www.52pojie.cn/forum.php?mod=image&aid=1323561&size=300x300&key=ff1a7982eced4be9&nocache=yes&type=贴图错误,请阅读“贴图帮助”。
图1 计算linker_addr汇编指令https://www.52pojie.cn/forum.php?mod=image&aid=1323551&size=300x300&key=61c4559da0a69e28&nocache=yes&type=贴图错误,请阅读“贴图帮助”。图2 计算linktime_addr内存值
503 #if defined(__clang_analyzer__)504 // The analyzer assumes that linker_addr will always be null. Make it an
505 // unknown value so we don't have to mark N places with NOLINTs.
506 //
507 // (`+=`, rather than `=`, allows us to sidestep a potential "unused store"
508 // complaint)
509 linker_addr += reinterpret_cast<uintptr_t>(raw_args);
510 #endif
511
512 ElfW(Addr) entry_point = args.getauxval(AT_ENTRY);
513 ElfW(Ehdr)* elf_hdr = reinterpret_cast<ElfW(Ehdr)*>(linker_addr);
514 ElfW(Phdr)* phdr = reinterpret_cast<ElfW(Phdr)*>(linker_addr + elf_hdr->e_phoff);
515
516 soinfo linker_so(nullptr, nullptr, nullptr, 0, 0);
517
518 linker_so.base = linker_addr;
519 linker_so.size = phdr_table_get_load_size(phdr, elf_hdr->e_phnum);
520 linker_so.load_bias = get_elf_exec_load_bias(elf_hdr);
521 linker_so.dynamic = nullptr;
522 linker_so.phdr = phdr;
523 linker_so.phnum = elf_hdr->e_phnum;
524 linker_so.set_linker_flag();
525
526 // Prelink the linker so we can access linker globals.
527 if (!linker_so.prelink_image()) __linker_cannot_link(args.argv);
528
529 // This might not be obvious... The reasons why we pass g_empty_list
530 // in place of local_group here are (1) we do not really need it, because
531 // linker is built with DT_SYMBOLIC and therefore relocates its symbols against
532 // itself without having to look into local_group and (2) allocators
533 // are not yet initialized, and therefore we cannot use linked_list.push_*
534 // functions at this point.
535 if (!linker_so.link_image(g_empty_list, g_empty_list, nullptr)) __linker_cannot_link(args.argv);
536
513-514行就是拿linker_addr作为linker的加载地址,分别拿到linker文件的Ehdr和Phdr结构的指针,关于ELF格式,也只能假装看这边文章的人是都了解的!516行的soinfo对象,在linker里面是代表动态库的一个类,每个加载到内存的动态库都会有一个soinfo对象表示,同一个动态库文件dlopen两次是有能创建两个soinfo对象,并且动态库文件被影射到不同的内存逻辑地址上;这个现象只有在android 6以上才会出现,因为有android_namespace这个东西,这里先不展开。phdr_table_get_load_size()函数用于计算program headers中所有PT_LOAD节的长度之和,dlopen加载ELF格式的动态库时,除了映射相应的头部数据,会将program headers中所有PT_LOAD分别map到内存[ 通过ElfReader::LoadSegments()方法 ]。get_elf_exec_load_bias()函数计算load_bias,load_bias等于base_addr加上第一个PT_LOAD节的 offset - vaddr, 然而so动态库的第一个PT_LOAD的offset和vaddr一般情况下都等于0,也就是load_bias等于base_addr; 不过我们也是有遇到过例外的情况,所以在处理dynamic的对象时,要用load_bias作为基地址。只有表示linker的soinfo对象会调用set_linker_flag()标记自己为linker,这个标记在后面很多地方用于区分linker和一般动态库文件。这里还要留意linker_so是创建在栈上的,后面会使用get_libdl_info()函数在堆上再分配一个soinfo对象,放入solist列表中管理。527行的soinfo::prelink_image()函数也是linker中一个很重要的知识点!这个函数的作用是解析.dynamic,解析出符号表、字符串表、got、plt、hash表等等数据结构的内存位置、大小和一些相关参数。这个函数打算单独做写一篇分析。535行的soinfo::linker_image()函数是用在prelink_image之后做重定位的,这里也不展开分析了。
537 #if defined(__i386__)
538 // On x86, we can't make system calls before this point.
539 // We can't move this up because this needs to assign to a global.
540 // Note that until we call __libc_init_main_thread below we have
541 // no TLS, so you shouldn't make a system call that can fail, because
542 // it will SEGV when it tries to set errno.
543 __libc_init_sysinfo(args);
544 #endif
545
546 // Initialize the main thread (including TLS, so system calls really work).
547 __libc_init_main_thread(args);
548
549 // We didn't protect the linker's RELRO pages in link_image because we
550 // couldn't make system calls on x86 at that point, but we can now...
551 if (!linker_so.protect_relro()) __linker_cannot_link(args.argv);
552
553 // Initialize the linker's static libc's globals
554 __libc_init_globals(args);
555
556 // store argc/argv/envp to use them for calling constructors
557 g_argc = args.argc;
558 g_argv = args.argv;
559 g_envp = args.envp;
__libc_init_main_thread()方法和__libc_init_globals()方法是libc的内容,本文不做深入分析。551行的soinfo::protect_relro()函数最终调用到$(bionic)/linker/linker_phdr.cpp文件777行的_phdr_table_set_gnu_relro_prot()函数会将PT_GNU_RELRO段指向的内存地址通过mprotoct函数设置为PROT_READ。该函数会对PT_GNU_RELRO段指向的内存起始和结束地址取页对齐,如果PT_GNU_RELRO段指向的内存段不是页对齐,则会被over-protective as read-only。
561 // Initialize the linker's own global variables
562 linker_so.call_constructors();
563
564 // If the linker is not acting as PT_INTERP entry_point is equal to
565 // _start. Which means that the linker is running as an executable and
566 // already linked by PT_INTERP.
567 //
568 // This happens when user tries to run 'adb shell /system/bin/linker'
569 // see also https://code.google.com/p/android/issues/detail?id=63174
570 if (reinterpret_cast<ElfW(Addr)>(&_start) == entry_point) {
571 async_safe_format_fd(STDOUT_FILENO,
572 "This is %s, the helper program for dynamic executables.\n",
573 args.argv);
574 exit(0);
575 }
576
577 init_linker_info_for_gdb(linker_addr, kLinkerPath);
562行的soinfo::call_constructors()函数位于${bionic}/linker/linker_soinfo.cpp文件的388行, call_constructors递归了get_childred()返回列表中所有的soinfo,然后执行对应的init_func_函数和init_array_列表中的函数。 init_func_函数对应ELF文件中的DT_INIT段指向的函数;init_array_列表则是DT_INIT_ARRAY段包含的函数列表,DT_INIT_ARRAY段中的函数列表是代码中通过__attribute__ ((constructor)) 前缀修饰的全局函数,以及编译器为一些全局变量生成的构造函数。
388 void soinfo::call_constructors() {
389 if (constructors_called) {
390 return;
391 }
392
393 // We set constructors_called before actually calling the constructors, otherwise it doesn't
394 // protect against recursive constructor calls. One simple example of constructor recursion
395 // is the libc debug malloc, which is implemented in libc_malloc_debug_leak.so:
396 // 1. The program depends on libc, so libc's constructor is called here.
397 // 2. The libc constructor calls dlopen() to load libc_malloc_debug_leak.so.
398 // 3. dlopen() calls the constructors on the newly created
399 // soinfo for libc_malloc_debug_leak.so.
400 // 4. The debug .so depends on libc, so CallConstructors is
401 // called again with the libc soinfo. If it doesn't trigger the early-
402 // out above, the libc constructor will be called again (recursively!).
403 constructors_called = true;
404
405 if (!is_main_executable() && preinit_array_ != nullptr) {
406 // The GNU dynamic linker silently ignores these, but we warn the developer.
407 PRINT("\"%s\": ignoring DT_PREINIT_ARRAY in shared library!", get_realpath());
408 }
409
410 get_children().for_each([] (soinfo* si) {
411 si->call_constructors();
412 });
413
414 if (!is_linker()) {
415 bionic_trace_begin((std::string("calling constructors: ") + get_realpath()).c_str());
416 }
417
418 // DT_INIT should be called before DT_INIT_ARRAY if both are present.
419 call_function("DT_INIT", init_func_, get_realpath());
420 call_array("DT_INIT_ARRAY", init_array_, init_array_count_, false, get_realpath());
421
422 if (!is_linker()) {
423 bionic_trace_end();
424 }
425 }
570行判断_start和entry_point是否相当是为了检查新进程要加载的可执行文件是不是linker;_start是linker的入口地址,entry_point是在512行从辅助向量里读出来AT_ENTRY,可执行文件的地址。577行的init_linker_info_for_gdb()实现在linker_main.cpp文件的167行,用来填充linker_link_map对象,link_map是一个双向链表节点类,定义在$(bionic)/libc/include/link.h文件中。link_map保存了动态库的文件名、base_addr和.dynamic段的地址,linker_link_map链表保存了所有载入内存的动态库,看注释是给gdb用的,保持和ld.linux.so的兼容,记得看过一篇文章@jmpews还是哪位牛人用这个结构来枚举所有已加载的动态库!
579 // Initialize static variables. Note that in order to
580 // get correct libdl_info we need to call constructors
581 // before get_libdl_info().
582 sonext = solist = get_libdl_info(kLinkerPath, linker_so, linker_link_map);
583 g_default_namespace.add_soinfo(solist);
584
585 // We have successfully fixed our own relocations. It's safe to run
586 // the main part of the linker now.
587 args.abort_message_ptr = &g_abort_message;
588 ElfW(Addr) start_address = __linker_init_post_relocation(args);
589
590 INFO("[ Jumping to _start (%p)... ]", reinterpret_cast<void*>(start_address));
591
592 // Return the address that the calling assembly stub should jump to.
593 return start_address;
594 }
582行的get_libdl_info()函数的实现位于${bionic}/linker/dlfcn.cpp的279行,这也是比较有意思的地方,之前在__linker_init()函数的516行在栈上定义了linker_so对象用来表示linker二进制影响,而get_libdl_info()相当于在堆上拷贝linker_so构造了一个新的soinfo,然后添加到solist和sonext链表中。android 7.1.2之前,/system/lib/libdl.so这个动态库并没有加载到内存中的,get_libdl_info()创建的soinfo,被命名为libdl.so, 所以dlopen, dlclose, dlsym, dladdr这几个函数实际上都是直接链接linker当中的符号。而在android7.1.2之后,get_libdl_info()创建的soinfo对象soname="ld-android.so",/system/lib/libdl.so在加载依赖库时被载入内存,dlopen, dlclose, dlsym, dladdr这几个函数由libdl.so导出。588行的__linker_init_post_relocation()函数同样位于linker_main.cpp文件的215行,这个函数是__linker_init()在返回前最后调用的函数,它要创建可执行文件对应的soinfo、调用find_librarys加载依赖库等工作,完成可执行文件运行所需的一系列准备工作。
215 static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) {
216 ProtectedDataGuard guard;
....
226 // Initialize system properties
227 __system_properties_init(); // may use 'environ'
....
240 g_linker_logger.ResetState();
....
277 const char* executable_path = get_executable_path();
278 soinfo* si = soinfo_alloc(&g_default_namespace, executable_path, &file_stat, 0, RTLD_GLOBAL);
279 if (si == nullptr) {
280 async_safe_fatal("Couldn't allocate soinfo: out of memory?");
281 }
282
283 /* bootstrap the link map, the main exe always needs to be first */
284 si->set_main_executable();
285 link_map* map = &(si->link_map_head);
....
339 parse_LD_LIBRARY_PATH(ldpath_env);
340 parse_LD_PRELOAD(ldpreload_env);
341
342 somain = si;
....
358 // Load ld_preloads and dependencies.
359 std::vector<const char*> needed_library_name_list;
360 size_t ld_preloads_count = 0;
361
362 for (const auto& ld_preload_name : g_ld_preload_names) {
363 needed_library_name_list.push_back(ld_preload_name.c_str());
364 ++ld_preloads_count;
365 }
366
367 for_each_dt_needed(si, [&](const char* name) {
368 needed_library_name_list.push_back(name);
369 });
370
371 const char** needed_library_names = &needed_library_name_list;
372 size_t needed_libraries_count = needed_library_name_list.size();
373
374 // readers_map is shared across recursive calls to find_libraries so that we
375 // don't need to re-load elf headers.
376 std::unordered_map<const soinfo*, ElfReader> readers_map;
377 if (needed_libraries_count > 0 &&
378 !find_libraries(&g_default_namespace,
379 si,
380 needed_library_names,
381 needed_libraries_count,
382 nullptr,
383 &g_ld_preloads,
384 ld_preloads_count,
385 RTLD_GLOBAL,
386 nullptr,
387 true /* add_as_children */,
388 true /* search_linked_namespaces */,
389 readers_map,
390 &namespaces)) {
391 __linker_cannot_link(g_argv);
392 } else if (needed_libraries_count == 0) {
393 if (!si->link_image(g_empty_list, soinfo_list_t::make_list(si), nullptr)) {
394 __linker_cannot_link(g_argv);
395 }
396 si->increment_ref_count();
397 }
....
403 si->call_pre_init_constructors();
404
405 /* After the prelink_image, the si->load_bias is initialized.
406 * For so lib, the map->l_addr will be updated in notify_gdb_of_load.
407 * We need to update this value for so exe here. So Unwind_Backtrace
408 * for some arch like x86 could work correctly within so exe.
409 */
410 map->l_addr = si->load_bias;
411 si->call_constructors();
....
454 ElfW(Addr) entry = args.getauxval(AT_ENTRY);
455 TRACE("[ Ready to execute \"%s\" @ %p ]", si->get_realpath(), reinterpret_cast<void*>(entry));
456 return entry;
457 }
在__linker_init_post_relocation()函数中,227行调用了__system_properties_init()函数初始化system_properties读写相关的数据结构。240行的LinkerLogger::ResetState()函数会读取system_propertys参数里面的的'debug.ld.greylist_disabled'值,用来设置全局变量g_greylist_disabled;在g_greylist_disabled = true的情况下,is_greylisted()函数将会直接返回false,使得dlopen在动态库加载时的灰名单机制失效。之后创建可执行文件二进制映射对应的soinfo对象,并加到somain作为链表表头,然后在339行和340行解析LD_LIBRARY_PATH和LD_PRELOAD环境变量传入参数。LD_PRELOAD中的动态库会被添加到linker_main.cpp第110行声明的g_ld_preload_names字符串向量中;而LD_LIBRARY_PATH则会被插入全局变量g_default_namespaces名空间里。358行到372行之间的代码是将g_ld_preload_names的动态库列表,以及可执行文件ELF中DT_NEED段指明的依赖库列表的动态库文件路径添加到needed_library_name_list向量中,然后在378行调用find_libraries()函数加载到内存。find_libraries()函数也是dlopen加载动态库的关键核心函数,它要负责解析动态库ELF文件、映射内存、建立相应个的soinfo对象、检查权限、以及加载依赖库等工作。403行soinfo::call_pre_init_constructors()会调用soinfo中以及解析出来的DT_PREINIT_ARRAY段中函数列表内的函数。411行的soinfo::call_constructors()之前已经描述过。做完上述动作,可执行文件的加载基本完成,从辅助向量中读出可执行文件的入口地址,返回给__linker_init()函数,之后linker完成自举了,移交控制权给可执行程序。
I D:wadahana
邮箱:wadahana@gmail.com
申请通过,欢迎光临吾爱破解论坛,期待吾爱破解有你更加精彩,ID和密码自己通过邮件密码找回功能修改,请即时登陆并修改密码!
登陆后请在一周内在此帖报道,否则将删除ID信息。 注册成功,前来报道
页:
[1]