Quick insights using sqlelf

Published 2023-09-11 on Farid Zakaria's Blog

Please checkout the sqlelf repository and give me your feedback.

I wrote in my earlier post about releasing sqlelf. I had the hunch that the existing tooling we have to interface with our object formats, such as ELF, are antiquated and ill specified.

Declarative languages, such as SQL, are such a wonderful abstraction to articulate over data layouts and let you reason about what you want rather than how to get it.

Since continuing to noodle πŸ‘¨β€πŸ’» on the project, I’ve been pleasantly surprised at some of the introspection, specifically with respect to symbols, I’ve been able to accomplish with SQL that would have been a pain using traditional tools.

Come on a stroll with me on a few case studies I’ve gone through on how SQL guided analysis wins out.

Symbol Resolution

One of the primary data structures within the ELF file is the symbol table, especially the dynamic symbol table that allows the use of shared objects (libraries).

A typical question someone may ask themselves though is:

Which library that I load is providing function foo?

This is a worthwhile question because you would like to know which shared object is not only providing the symbol definition but also which the linker (ld.so) chooses to link against at runtime.

The state of the art (prior to sqlelf) of how to retrieve this diagnostic information is using LD_DEBUG environment variable and trolling through the large dump of logs it emits. 🀦

❯ LD_DEBUG=symbols,bindings /usr/bin/ruby |& head
   1228310:	symbol=__vdso_clock_gettime;  lookup in file=linux-vdso.so.1 [0]
   1228310:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_gettime' [LINUX_2.6]
   1228310:	symbol=__vdso_gettimeofday;  lookup in file=linux-vdso.so.1 [0]
   1228310:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_gettimeofday' [LINUX_2.6]
   1228310:	symbol=__vdso_time;  lookup in file=linux-vdso.so.1 [0]
   1228310:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_time' [LINUX_2.6]
   1228310:	symbol=__vdso_getcpu;  lookup in file=linux-vdso.so.1 [0]
   1228310:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_getcpu' [LINUX_2.6]
   1228310:	symbol=__vdso_clock_getres;  lookup in file=linux-vdso.so.1 [0]
   1228310:	binding file linux-vdso.so.1 [0] to linux-vdso.so.1 [0]: normal symbol `__vdso_clock_getres' [LINUX_2.6]

Let’s see how we can re-think of this question as a declarative SQL statement:

SELECT caller.path as 'caller.path',
       callee.path as 'calee.path',
       caller.name,
       caller.demangled_name
FROM ELF_SYMBOLS caller
INNER JOIN ELF_SYMBOLS callee
ON
caller.name = callee.name AND
caller.path != callee.path AND
caller.imported = TRUE AND
callee.exported = TRUE

We can think of the above ask asking:

Please provide all pairings of symbols where the name is the same between any two different ELF files. One of the files must export the symbol and the other must be importing it.

❯ sqlelf /usr/bin/ruby --sql "SELECT caller.path as 'caller.path',
       callee.path as 'calee.path',
       caller.name
FROM ELF_SYMBOLS caller
INNER JOIN ELF_SYMBOLS callee
ON
caller.name = callee.name AND
caller.path != callee.path AND
caller.imported = TRUE AND
callee.exported = TRUE
LIMIT 10"
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  caller.path  β”‚                   calee.path                   β”‚       name        β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β”‚ ruby_run_node     β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β”‚ ruby_init         β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β”‚ ruby_options      β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β”‚ ruby_sysinit      β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libc.so.6            β”‚ __stack_chk_fail  β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libruby-3.1.so.3.1.2 β”‚ ruby_init_stack   β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libc.so.6            β”‚ setlocale         β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libc.so.6            β”‚ __libc_start_main β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libc.so.6            β”‚ __libc_start_main β”‚
β”‚ /usr/bin/ruby β”‚ /usr/lib/x86_64-linux-gnu/libc.so.6            β”‚ __cxa_finalize    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ₯³ That sure beats dealing with unstructured text. sqlelf can also emit a myriad of output formats such as csv or json.

Symbol Shadowing

The reality of LD_DEBUG however is that it usefulness came in knowing the final resolution of a given symbol.

ELF files are free to have and export any symbols and often times, whether by accident (benign) or maliciously, the same symbol may be exported by multiple libraries.

This is what empowers tools such as LD_PRELOAD so that users can take over symbols such as malloc and replace them with alternative strategies.

The linker, according to the SystemV ABI, examines the symbol tables with a breadth-first search across the dependency graph of the shared object libraries.

A typical question someone may ask themselves though is:

What symbols are currently shadowed in my dependency graph?

Let’s see how we can re-think of this question as a declarative SQL statement:

SELECT name, version, count(*) as symbol_count,
       GROUP_CONCAT(path, ':') as libraries
FROM elf_symbols
WHERE exported = TRUE
GROUP BY name, version
HAVING count(*) >= 2

We can think of the above ask asking:

Please provide me all symbols (and the library that defines them) that are exported by more than 2 libraries.

Any symbol here is technically being shadowed, whether on purpose or benign.

Revisiting the same ruby example above we can see the results.

❯ sqlelf /usr/bin/ruby --recursive --sql "
SELECT name, version, count(*) as symbol_count,
       GROUP_CONCAT(path, ':') as libraries
FROM elf_symbols
WHERE exported = TRUE
GROUP BY name, version
HAVING count(*) >= 2"
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    name    β”‚   version   β”‚ symbol_count β”‚                                libraries                                β”‚
β”‚ __finite   β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ __finitef  β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ __finitel  β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ __signbit  β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ __signbitf β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ __signbitl β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ copysign   β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ copysignf  β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ copysignl  β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ finite     β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ finitef    β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ finitel    β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ frexp      β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ frexpf     β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ frexpl     β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ ldexp      β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ ldexpf     β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ ldexpl     β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ modf       β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ modff      β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ modfl      β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ scalbn     β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ scalbnf    β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β”‚ scalbnl    β”‚ GLIBC_2.2.5 β”‚ 2            β”‚ /usr/lib/x86_64-linux-gnu/libm.so.6:/usr/lib/x86_64-linux-gnu/libc.so.6 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In this particular case, there is no malicious symbol shadowing and the symbols shadowed by libm and libc are well known. In fact, on many systems, libm is a symlink to libc.

I have previously written about through the development of shrinkwrap that a more annoying shadowing can happen with OpenMPI. It’s pretty easy to accidentally get the no-op library implementation earlier in breadth-first search and find yourself with a sequential application.

I’ve included a neat example in the sqlelf repository that you can play with to test shadowing symbols and see the results of sqlelf. πŸ•΅οΈ

If you find any of this fascinating, contribute and let’s work to make accessing ELF via SQL simple and productive.