Farid Zakaria's Blog

JVM boot optimization via JavaIndex

Ever heard of a JarIndex? I had been doing JVM develpoment for 10+ years and I hadn’t. Read on to discover what it is and how it can speedup your compilation and boot time. 🤓

After having worked on Shrinkwrap and publishing our results in Mapping Out the HPC Dependency Chaos, you start to see the Linux environment as a bit of an oddball.

Everything in Linux is structured around O(n) or O(n^2) search and lookup.

This feels now unsuprising given that everything in Linux searches across colon separate lists (i.e. LD_LIBRARY_PATH, RUN_PATH). This idiom however is even more pervasive and has bled into all of our language.

The JVM for instance, must search for classes amongst a set of directories, files or JARs set on the CLASS_PATH.


Bazel Knowledge: What's an Interface JAR?

I spent the day working through an upgrade of our codebase at $DAYJOB$ to Java21 and hit Bazel issue#24138 as a result of an incorrectly produced hjar.

🤨 WTF is an hjar ?

☝️ It is the newer version of ijar !

😠 WTF is an ijar ?

Let’s discover what an ijar (Interface JAR) is and how it’s the magic sauce that makes Bazel so fast for Java.


Bazel Knowledge: mind your PATH

Have you encountered the following?

> bazel build
INFO: Invocation ID: f16c3f83-0150-494e-bd34-1a9cfb6a2e67
WARNING: Build option --incompatible_strict_action_env has changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed).
INFO: Analyzed target @@com_google_protobuf//:protoc (113 packages loaded, 1377 targets configured).
[483 / 845] 13 actions, 12 running
    Compiling src/google/protobuf/compiler/importer.cc; 3s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/java/names.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/java/name_resolver.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/java/helpers.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/objectivec/enum.cc; 1s disk-cache, darwin-sandbox
    Compiling absl/strings/cord.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/objectivec/names.cc; 0s disk-cache, darwin-sandbox
    Compiling absl/time/internal/cctz/src/time_zone_lookup.cc; 0s disk-cache, darwin-sandbox ...

I finally had it with Bazel recompiling protoc 😤

The working title for this post: Why the #$@! does protoc keep recompiling! 🤬

If you are not interested in the story and just want to avoid recompiling protoc, try putting build --incompatible_strict_action_env in your .bazelrc.

Checkout Aspect’s bazelrc guide for other good tidbits.


Bazel Knowledge: Aspects to generate Java CLASSPATH

One of the more advanced features of Bazel is the concept of aspect.

For a very brief primer on why you may want an aspect is that Bazel let’s you audit and analyze the BUILD graph without performing any actual builds. It does this by constructing a “shadow graph” that your aspect can perform analysis on. This can be useful for a variety things such as IDE integration.

I wanted to ask a very simple question to make integration with Visual Studio Code straightforward:

“What’s the CLASSPATH I need for a particular target so that I don’t get red squigglies?”


Bazel Knowledge: reproducible outputs

You might hear a lot of about how Bazel is “reproducible” and “hermetic”, but what does that even mean ? 😕

Part of what makes Bazel incredibly fast is it effectively skips work by foregoing doing portions of the graph if the inputs have not changed.

Let’s consider this simple action graph in Bazel.

Bazel Action Graph


Bazel Knowledge: Secret //external directory

Did you know Bazel has a secret //external package that is created that contains all the external repositories that are you added to WORKSPACE.bazel or MODULE.bazel ? 🤓

Let’s start with a very minimal WORKSPACE that pulls in the GNU Hello codebase.

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "gnu_hello",
    urls = ["https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz"],
    strip_prefix = "hello-2.10",
    sha256 = "31e066137a962676e89f69d1b65382de95a7ef7d914b8cb956f41ea72e0f516b",
    build_file = "//third_party:gnu_hello.BUILD",
)

Bazel Knowledge: Reference targets by output name

In an attempt to try and record some of the smaller knowledge brain gains on using Bazel, I’m hoping to write a few smaller article. 🤓

Did you know you can reference an output file directly by name or the target name that produced it?

load("@bazel_skylib//rules:diff_test.bzl", "diff_test")

genrule(
    name = "src_file",
    outs = ["file.txt"],
    cmd = "echo 'Hello, Bazel!' > $@",
)

diff_test(
    name = "test_equality",
    file1 = ":src_file",
    file2 = ":file.txt",
)

⚠️ If the output is the same name as the rule Bazel will give you a warning but everything still seems to work.

I tend to prefer matching by rule name. I’m not yet aware of any reason to prefer one over the other.


Bazel Overlay Pattern

Do you have an internal fork of a codebase you’ve added Bazel BUILD files to?

Do you want to open-source the BUILD files (+ additional files) but doing so into the upstream project might be a bit too onerous to start? 🤔

Continuing with my dive 🤿 into Bazel for $DAYJOB$, I wanted to touch on a pattern I’ve only ever seen employed by Google for LLVM but I’m finding very powerful: Bazel Overlay Pattern.


Bazel WORKSPACE chunking

I have been doing quite a lot of Bazel for $DAYJOB$; and it’s definitely got it’s fair share of warts.

I have my own misgivings of it’s migration to bzlmod and it converging to a standard-issue dependency-management style tool.

We have yet to transition to MODULE.bazel and our codebase is quite large. As you’d expect, we hit quite a lot of diamond dependency issues & specifically with external repositories in our WORKSPACE file.

A surprising implementation detail I recently learned was how Bazel does dependency resolution for external repositories in WORKSPACE.


NixOS, Raspberry Pi & Me

I have written a lot about NixOS, so it’s no surprise that when I went to go dust off my old Raspberry Pi 4, I looked to rebrand it as a new NixOS machine.

Before I event went to play with my Pi, I was unhappy with my current home-networking setup and looked to give it a refresh.

I have had always a positive experience with Ubiquiti line of products. I installed two new AP (access points) and setup a beautiful home rack server that is completely unnecessary since my Internet provider is Comcast with top upload speeds of 35 Mbps 🥲.