Please checkout the sqlelf repository and give me your feedback.
I wrote in my earlier post about releasing sqlelf. I had the hunch that the existing tooling we have to interface with our object formats, such as ELF, are antiquated and ill specified.
Declarative languages, such as SQL, are such a wonderful abstraction to articulate over data layouts and let you reason about what you want rather than how to get it.
Since continuing to noodle π¨βπ» on the project, Iβve been pleasantly surprised at some of the introspection, specifically with respect to symbols, Iβve been able to accomplish with SQL that would have been a pain using traditional tools.
Come on a stroll with me on a few case studies Iβve gone through on how SQL guided analysis wins out.
Symbol Resolution
One of the primary data structures within the ELF file is the symbol table, especially the dynamic symbol table that allows the use of shared objects (libraries).
A typical question someone may ask themselves though is:
Which library that I load is providing function foo?
This is a worthwhile question because you would like to know which shared object is not only providing the symbol definition but also which the linker (ld.so
) chooses
to link against at runtime.
The state of the art (prior to sqlelf) of how to retrieve this diagnostic information is using LD_DEBUG
environment variable and trolling through the large dump of logs it emits. π€¦
β― LD_DEBUG=symbols,bindings /usr/bin/ruby |& head
1228310: symbol=__vdso_clock_gettime; lookup in file=linux-vdso.so.1 [0]
1228310: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
1228310: symbol=__vdso_gettimeofday; lookup in file=linux-vdso.so.1 [0]
1228310: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_gettimeofday' [LINUX_2.6]
1228310: symbol=__vdso_time; lookup in file=linux-vdso.so.1 [0]
1228310: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_time' [LINUX_2.6]
1228310: symbol=__vdso_getcpu; lookup in file=linux-vdso.so.1 [0]
1228310: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
1228310: symbol=__vdso_clock_getres; lookup in file=linux-vdso.so.1 [0]
1228310: binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_getres' [LINUX_2.6]
Letβs see how we can re-think of this question as a declarative SQL statement:
SELECT caller.path as 'caller.path',
callee.path as 'calee.path',
caller.name,
caller.demangled_name
FROM ELF_SYMBOLS caller
INNER JOIN ELF_SYMBOLS callee
ON
caller.name = callee.name AND
caller.path != callee.path AND
caller.imported = TRUE AND
callee.exported = TRUE
We can think of the above ask asking:
Please provide all pairings of symbols where the name is the same between any two different ELF files. One of the files must export the symbol and the other must be importing it.
β― sqlelf /usr/bin/ruby --sql "SELECT caller.path as 'caller.path',
callee.path as 'calee.path',
caller.name
FROM ELF_SYMBOLS caller
INNER JOIN ELF_SYMBOLS callee
ON
caller.name = callee.name AND
caller.path != callee.path AND
caller.imported = TRUE AND
callee.exported = TRUE
LIMIT 10"
βββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββ
β caller.path β calee.path β name β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β ruby_run_node β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β ruby_init β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β ruby_options β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β ruby_sysinit β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libc.so.6 β __stack_chk_fail β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β ruby_init_stack β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libc.so.6 β setlocale β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libc.so.6 β __libc_start_main β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libc.so.6 β __libc_start_main β
β /usr/bin/ruby β /usr/lib/x86_64-linux-gnu/libc.so.6 β __cxa_finalize β
βββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββ
π₯³ That sure beats dealing with unstructured text. sqlelf can also emit a myriad of output formats such as csv or json.
Symbol Shadowing
The reality of LD_DEBUG
however is that it usefulness came in knowing the final resolution of a given symbol.
ELF files are free to have and export any symbols and often times, whether by accident (benign) or maliciously, the same symbol may be exported by multiple libraries.
This is what empowers tools such as
LD_PRELOAD
so that users can take over symbols such asmalloc
and replace them with alternative strategies.
The linker, according to the SystemV ABI, examines the symbol tables with a breadth-first search across the dependency graph of the shared object libraries.
A typical question someone may ask themselves though is:
What symbols are currently shadowed in my dependency graph?
Letβs see how we can re-think of this question as a declarative SQL statement:
SELECT name, version, count(*) as symbol_count,
GROUP_CONCAT(path, ':') as libraries
FROM elf_symbols
WHERE exported = TRUE
GROUP BY name, version
HAVING count(*) >= 2
We can think of the above ask asking:
Please provide me all symbols (and the library that defines them) that are exported by more than 2 libraries.
Any symbol here is technically being shadowed, whether on purpose or benign.
Revisiting the same ruby example above we can see the results.
β― sqlelf /usr/bin/ruby --recursive --sql "
SELECT name, version, count(*) as symbol_count,
GROUP_CONCAT(path, ':') as libraries
FROM elf_symbols
WHERE exported = TRUE
GROUP BY name, version
HAVING count(*) >= 2"
ββββββββββββββ¬ββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β name β version β symbol_count β libraries β
β __finite β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β __finitef β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β __finitel β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β __signbit β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β __signbitf β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β __signbitl β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β copysign β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β copysignf β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β copysignl β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β finite β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β finitef β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β finitel β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β frexp β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β frexpf β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β frexpl β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β ldexp β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β ldexpf β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β ldexpl β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β modf β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β modff β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β modfl β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β scalbn β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β scalbnf β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
β scalbnl β GLIBC_2.2.5 β 2 β /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β
ββββββββββββββ΄ββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
In this particular case, there is no malicious symbol shadowing and the symbols shadowed by libm
and libc
are well
known. In fact, on many systems, libm
is a symlink to libc
.
I have previously written about through the development of shrinkwrap that a more annoying shadowing can happen with OpenMPI. Itβs pretty easy to accidentally get the no-op library implementation earlier in breadth-first search and find yourself with a sequential application.
Iβve included a neat example in the sqlelf repository that you can play with to test shadowing symbols and see the results of sqlelf. π΅οΈ
If you find any of this fascinating, contribute and letβs work to make accessing ELF via SQL simple and productive.