Bazel Overlay Pattern

Published 2024-08-29 on Farid Zakaria's Blog

Do you have an internal fork of a codebase you’ve added Bazel BUILD files to?

Do you want to open-source the BUILD files (+ additional files) but doing so into the upstream project might be a bit too onerous to start? 🤔

Continuing with my dive 🤿 into Bazel for $DAYJOB$, I wanted to touch on a pattern I’ve only ever seen employed by Google for LLVM but I’m finding very powerful: Bazel Overlay Pattern.

I have first encountered this pattern employed by Google in their llvm-bazel repository.

With the Bazel Overlay Pattern, you can open-source the Bazel build system for a separate project & repository.

This is useful if the upstream project does not want to accept the BUILD files themselves or if you want to validate it working in the open first before proposing the change itself.

🤨 Wait… I thought Bazel has the “Bazel Registry” which already has a bunch of external projects building with Bazel.

Sort of. The BUILD files introduced into the registry either wrap the existing build-system using something rules_foreign_cc or bring in the minimal Bazel BUILD files needed to build the final target. The BUILD files offered in the registry are not suited for daily development for that project, they are missing granular build targets, test targets & other developer producitivity targets (i.e. lint, format etc..).

🤌🏼 We want to upstream BUILD files that are meant to be the “real” build system for the project.

Bazel Overlay Pattern

I have created the project bazel-overlay-example that you can checkout on GitHub for reference.

Let’s run throgh a very minimal example to understand how this work. We have a C project, “hello_world”, with a single file.

hello_world/
└── cmd
    └── hello.c

In a separate project, “hello_world-overlay”, we create a directory with a directory structure matching that of the target project. In this repository within a folder called bazel-overlay, include all the files we only need to build our project using Bazel.

hello_world-overlay/
├── bazel-overlay
│   └── cmd
│       └── BUILD.bazel
├── BUILD.bazel
├── configure.bzl
├── overlay_directories.py
├── third_party
│   └── hello_world-> /tmp/hello_world
└── WORKSPACE

Additionally, create a reference to the original project in a directory third_party. This can likely a git-submodule but it can even be a symlink or http_archive in the WORKSPACE.bazel file.

The 💡 awesome idea for the overlay pattern leverages Bazel’s repository rules.

We’ve added two extra files: configure.bzl and overlay_directories.py.

These two files were simplified copies, nearly verbatim, from Google’s llvm-bazel repository.

These two files do the “magic” 🪄.

Feel free to read the files to see how they work. They are a bit too long to include verbatim on this post. They simply iterate over the files and setup symlinks.

We set it up in our WORKSPACE like so:

load(":configure.bzl", "overlay_configure")

overlay_configure(
    name = "hello-world",
    overlay_path = "bazel-overlay",
    src_path = "./third_party/hello_world",
)

When you try to build the external repository @hello-world//, the repository rule will symlink all the files in the overlay_path & the src_path together.

$ tree $(bazel info output_base)/external/hello-world

/home/fmzakari/.cache/bazel/_bazel_fmzakari/738ca8ce4d1d8ce828e952fe7b9fdd95/external/hello-world
├── cmd
│   ├── BUILD.bazel -> /tmp/hello_world-overlay/bazel-overlay/cmd/BUILD.bazel
│   └── hello.c -> /tmp/hello_world-overlay/third_party/hello_world/cmd/hello.c
└── WORKSPACE

It’s as if the two repositories were merged! 😎

$ bazel run @hello-world//cmd:hello
INFO: Analyzed target @hello-world//cmd:hello (24 pack ages loaded, 90 targets configured).
INFO: Found 1 target...
Target @hello-world//cmd:hello up-to-date:
  bazel-bin/external/hello-world/cmd/hello
INFO: Elapsed time: 0.068s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/external/hello-world/cmd/hello
Hello, World!

This is a surprising powerful pattern that lets you explore adding the Bazel build system for a separate repository. The benefit to doing in a separate repository vs. a branch is that it’s easy to track HEAD. If your third_party is a git-submodule, you can keep moving the submodule forward and validating the build succeeds.

I’m moving forward with this pattern to explore upstreaming $DAYJOB$ Bazel build system to the open source repository. 🙌

❗ I recently contributed PR#22349 to Bazel which does add an overlay concept to http_archive which almost looks like it could do this as well but if you had a lot BUILD files it would be tedious to manually list them out.

http_archive(
  name="hello_world",
  strip_prefix="hello_world-0.1.2",
  urls=["https://fake.com/hello_world.zip"],
  remote_file_urls={
    "WORKSPACE": ["https://fake.com/WORKSPACE"],
    "cmd/BUILD.bazel": ["https://fake.com/cmd/BUILD.bazel"],
  },
)

For now, I’m sticking with the Bazel Overlay Pattern.