Farid Zakaria's Blog

Speeding up ELF relocations for store-based systems

Since the introduction of Nix and similar store-based systems such as Guix or Spack, I have been fascinated about finding improvements that take advantage of the new paradigms they introduce. Linux distributions are traditionally dynamic in nature, with shared libraries and executables being linked at runtime. Store-based systems, however, are static in nature, with all dependencies being resolved at build time. This determinism allows for not only reproducibility but also the ability to optimize various aspects of our toolchain.

Work that I’ve have written previously about shows that there are worthwhile speedups that can be gained. While previously, I focused on improving the stat storm that occurs when resolving dependencies, I have recently been looking at speeding up the ELF relocations that occur when executing a program.

You can check out my publication Mapping Out the HPC Dependency Chaos about the development of shrinkwrap if you are interested in the topic.

Extending the idea further, I have been looking at how we can optimize the ELF relocations that occur when executing a program. In this post, I will discuss the basics of ELF relocations and symbol resolution and how we can optimize these processes for store-based systems.


Visualizing GitHub workflow run length time

GitHub Actions have now become an integral part of many open source projects, providing a free & powerful CI system. I am surprised however there is no provided way to visualize the run length time (or other meaningful metrics) of your actions.

🕵️ I did find a few other third-party solutions that either extract the data or themselves can be added as a step to your workflow to get similar visualizations. I wanted something simpler.

I previously wrote a post about the cost of runfiles which had become evident when we noticed our GitHub Bazel build workflow had slowed down by 50x.

After landing my fix, I wanted to visualize the run length time of the action; and objectively see if my fix had worked. Trust but verify.


Hermetic, but at what cost?

tl;dr; This is a little story about how making Bazel hermetic can lead to some unexpected consequences. In this particular case, it caused our GitHub action to slow down by 50x from 1 minute to over 60 minutes.

The fix recommended was to apply the following to your .bazelrc – I needed to understand why however.

# Disabling runfiles links drastically increases performance in slow disk IO situations
# Do not build runfile trees by default. If an execution strategy relies on runfile
# symlink tree, the tree is created on-demand. See: https://github.com/bazelbuild/bazel/> > issues/6627
# and https://github.com/bazelbuild/bazel/commit/03246077f948f2790a83520e7dccc2625650e6df
build --nobuild_runfile_links
test --nobuild_runfile_links

# https://bazel.build/reference/command-line-reference#flag--legacy_external_runfiles
build --nolegacy_external_runfiles
test --nolegacy_external_runfiles

Abusing GitHub as a PyPI server

I did not discover or invent this trick .

I wanted to make available a Python wheel to some developers but I did not want to publish it on PyPI for a variety of reasons.

  1. I am not the original author of the code and I did not want to take credit for it.
  2. I wanted to include the git commit hash in the version number which PyPI does not allow.

Quick insights using sqlelf

Please checkout the sqlelf repository and give me your feedback.

I wrote in my earlier post about releasing sqlelf. I had the hunch that the existing tooling we have to interface with our object formats, such as ELF, are antiquated and ill specified.

Declarative languages, such as SQL, are such a wonderful abstraction to articulate over data layouts and let you reason about what you want rather than how to get it.

Since continuing to noodle 👨‍💻 on the project, I’ve been pleasantly surprised at some of the introspection, specifically with respect to symbols, I’ve been able to accomplish with SQL that would have been a pain using traditional tools.


sqlelf and 20 years of Nix

If you want to skip ahead, please checkout the sqlelf repository and give me your feedback.

🎉 We are celebrating 20 years of Nix 🎉

Within that 20 years Nix has ushered in a new paradigm of how to build software reliably that is becoming more ubiquitous in the software industry. It has inspired imitators such as Spack & Guix.

Given the concepts introduced by Nix and it’s willingnes to eschew some fundamental Linux concepts such as the Filesystem Hierarchy Standard.

I can’t help but think has Nix gone far enough within the 20 years?


Making RUNPATH redundant for Nix

This post is a direct translation of Harmen Stoppel’s blog on the same subject for Nix. He has also contributed a fix to Spack.

Please check out https://github.com/fzakaria/nix-harden-needed for my solution for the Nix ecosystem.

Nix and other store-like systems (i.e. Guix or Spack), resolve all their dependencies within their store (/nix/store) to enforce hermiticity. They leverage for the most part RUNPATH which is a field on the ELF executable to instruct the dynamic linker where to discover the libraries – as opposed to searching default search paths like /lib.

❯ which ruby
/nix/store/8k4sgk3bmxnj0jvcgc4wvyd8ilg0ww3y-ruby-2.7.6/bin/ruby

❯ patchelf --print-rpath $(which ruby)
/nix/store/8k4sgk3bmxnj0jvcgc4wvyd8ilg0ww3y-ruby-2.7.6/lib:/nix/store/r90cncsaa519pwqpijg7ii4rkcmwjn6h-zlib-1.2.12/lib:/nix/store/bvy2z17rzlvkx2sj7fy99ajm853yv898-glibc-2.34-210/lib

I have a paper about to published for SuperComputing 2022 (please reach out if you’d like early copy) that demonstrates there is a non-trivial cost to continuously searching needlessly through the RUNPATH. In fact, I have written previously about the specific costs and our tool Shrinkwrap that can avoid it.

Although Shrinkwrap is one approach for a solution, it is merely a bandaid over the existing problem.

Can systems like Nix do more to solve this problem?


Shrinkwrap: Taming dynamic shared objects

This is a blog post of a paper I have submitted for a UCSC course project.

If you are interested in the code check out https://github.com/fzakaria/shrinkwrap

One of the fundamental data management units within a Linux system are the shared object files that are loaded into memory by dynamically linked processes at startup. The mechanism and approach to which dynamic linking is done has not changed since it’s inception however software has become increasingly complex.


Computing all output paths for every attribute in Nixpkgs

Nix is an amazing tool, unfortunately doing simple things can be quite challenging.

This is a little write-up of my attempt to try and accomplish what I would have thought to be a simple thing; computing all store paths for every attribute in nixpkgs.

Why would I want to do such a thing?

I had some /nix/store entries on my system and I wanted to revisit the exact nixpkgs commit with which it was built to debug something. Without this reverse index you are pretty much out of luck for figuring it out.

I want to give early shoutout to other similar tools in this space that let you do meta searches over nixpkgs such as Nix Package Versions and Pkgs on Nix.


Using an overlay filesystem to improve Nix CI builds

Using Nix in our CI system has been a huge boon. Through Nix we have a level of guarantee of reproducibility between our local development environment and our CI platform. 🙌

Our CI infrastructure leverages containers (don’t they all now?) for each job and we explored different solutions to reduce the cost of constantly downloading the /nix/store necessary for the build.