Farid Zakaria's Blog

Bazel Knowledge: Be mindful of Build Without the Bytes (bwob)

Bazel is a pretty amazing tool but it’s definitely full of it’s warts, sharp edges and arcane knowledge.

The appeal to most who adopt Bazel is the ability to memoize much of the build graph if nothing has changed. Furthermore, while leveraging remote caches, build results can be shared across machines making memoization even more effective.

This was a pretty compelling reason to adopt Bazel but pretty soon many noticed, especially on their CI systems, lots of unecessary data transfers for larger codebases.

😲 If the network is poor, the benefits of remote caching (memoization) can be outweighed by the cost to download the artifacts.


Faking incremental Docker loads

While testcontainers have made it simple to run containers for unit & system tests, they are not well suited for Bazel as they rely on docker pull to hydrate the Docker daemon. The pulls rely on tags which may be rewritten and require input from data (i.e, the images themselves) unknown to Bazel, as well as network access.


Bazel Knowledge: Protobuf is the worst when it should be the best

Bazel has always had support for protocol buffers (protobuf) since the beginning. Both being a Google product, one would think that their integration would be seamless and the best experience. Unfortunately, it’s some of the worst part of the user experience with Bazel I’ve found. 😔


JVM boot optimization via JavaIndex

Ever heard of a JarIndex? I had been doing JVM development for 10+ years and I hadn’t. Read on to discover what it is and how it can speedup your compilation and boot time. 🤓

After having worked on Shrinkwrap and publishing our results in Mapping Out the HPC Dependency Chaos, you start to see the Linux environment as a bit of an oddball.

Everything in Linux is structured around O(n) or O(n^2) search and lookup.

This feels now unsurprising given that everything in Linux searches across colon separate lists (i.e. LD_LIBRARY_PATH, RUN_PATH). This idiom however is even more pervasive and has bled into all of our language.

The JVM for instance, must search for classes amongst a set of directories, files or JARs set on the CLASS_PATH.


Bazel Knowledge: What's an Interface JAR?

I spent the day working through an upgrade of our codebase at $DAYJOB$ to Java21 and hit Bazel issue#24138 as a result of an incorrectly produced hjar.

🤨 WTF is an hjar ?

☝️ It is the newer version of ijar !

😠 WTF is an ijar ?

Let’s discover what an ijar (Interface JAR) is and how it’s the magic sauce that makes Bazel so fast for Java.


Bazel Knowledge: mind your PATH

Have you encountered the following?

> bazel build
INFO: Invocation ID: f16c3f83-0150-494e-bd34-1a9cfb6a2e67
WARNING: Build option --incompatible_strict_action_env has changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed).
INFO: Analyzed target @@com_google_protobuf//:protoc (113 packages loaded, 1377 targets configured).
[483 / 845] 13 actions, 12 running
    Compiling src/google/protobuf/compiler/importer.cc; 3s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/java/names.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/java/name_resolver.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/java/helpers.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/objectivec/enum.cc; 1s disk-cache, darwin-sandbox
    Compiling absl/strings/cord.cc; 1s disk-cache, darwin-sandbox
    Compiling src/google/protobuf/compiler/objectivec/names.cc; 0s disk-cache, darwin-sandbox
    Compiling absl/time/internal/cctz/src/time_zone_lookup.cc; 0s disk-cache, darwin-sandbox ...

I finally had it with Bazel recompiling protoc 😤

The working title for this post: Why the #$@! does protoc keep recompiling! 🤬

If you are not interested in the story and just want to avoid recompiling protoc, try putting build --incompatible_strict_action_env in your .bazelrc.

Checkout Aspect’s bazelrc guide for other good tidbits.


Bazel Knowledge: Aspects to generate Java CLASSPATH

One of the more advanced features of Bazel is the concept of aspect.

For a very brief primer on why you may want an aspect is that Bazel let’s you audit and analyze the BUILD graph without performing any actual builds. It does this by constructing a “shadow graph” that your aspect can perform analysis on. This can be useful for a variety things such as IDE integration.

I wanted to ask a very simple question to make integration with Visual Studio Code straightforward:

“What’s the CLASSPATH I need for a particular target so that I don’t get red squigglies?”


Bazel Knowledge: reproducible outputs

You might hear a lot of about how Bazel is “reproducible” and “hermetic”, but what does that even mean ? 😕

Part of what makes Bazel incredibly fast is it effectively skips work by foregoing doing portions of the graph if the inputs have not changed.

Let’s consider this simple action graph in Bazel.

Bazel Action Graph


Bazel Knowledge: Secret //external directory

Did you know Bazel has a secret //external package that is created that contains all the external repositories that are you added to WORKSPACE.bazel or MODULE.bazel ? 🤓

Let’s start with a very minimal WORKSPACE that pulls in the GNU Hello codebase.

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "gnu_hello",
    urls = ["https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz"],
    strip_prefix = "hello-2.10",
    sha256 = "31e066137a962676e89f69d1b65382de95a7ef7d914b8cb956f41ea72e0f516b",
    build_file = "//third_party:gnu_hello.BUILD",
)

Bazel Knowledge: Reference targets by output name

In an attempt to try and record some of the smaller knowledge brain gains on using Bazel, I’m hoping to write a few smaller article. 🤓

Did you know you can reference an output file directly by name or the target name that produced it?

load("@bazel_skylib//rules:diff_test.bzl", "diff_test")

genrule(
    name = "src_file",
    outs = ["file.txt"],
    cmd = "echo 'Hello, Bazel!' > $@",
)

diff_test(
    name = "test_equality",
    file1 = ":src_file",
    file2 = ":file.txt",
)

⚠️ If the output is the same name as the rule Bazel will give you a warning but everything still seems to work.

I tend to prefer matching by rule name. I’m not yet aware of any reason to prefer one over the other.