Bazel Knowledge: A practical guide to depset
Published 2025-05-20 on Farid Zakaria's Blog
Bazel’s depset is a powerful construct for managing transitive dependencies efficiently. While it’s commonly used in complex rules and providers, sometimes a simple use case can illuminate its utility.
Ok… but what does that meah? 🤔
This is a companion post to Jay Conrad excellent series on writing rules. The series is excellent and I recommend you read it 🤓.
Consider this simple guide for understanding depset
. We will be writing a simple ruleset rules_graphviz
.
Graphviz is an open source graph visualization software. Graphs are defined via the DOT
language which is a grammar for defining Graphviz nodes, edges, graphs.
For instance, let’s take the simple graph G below.
digraph G {
"A" -> "B"
"A" -> "D"
"B" -> "C"
"B" -> "D"
}
Would produce the following visualization.
┌───┐ ┌───┐
│ D │ ◀── │ A │
└───┘ └───┘
▲ │
│ │
│ ▼
│ ┌───┐
└────── │ B │
└───┘
│
│
▼
┌───┐
│ C │
└───┘
Our goal would be to write a Bazel rule that let’s us model this graph purely in Bazel targets.
load("@rules_graphviz//:dot.bzl", "node")
node(
name = "A",
edges = [
":B",
":D",
],
)
node(
name = "B",
edges = [
":C",
":D",
],
)
node(name = "C")
node(name = "D")
We would like a rule that creates a text file (dot file) of the digraph
representation of all the nodes reachable from
the given target.
That means every node should know it’s dependencies (i.e. edges), and we’d like a way to traverse the whole graph.
💡 We could do this traversal with a standard algorithm (i.e. breadth-first-search) knowing only the direct edges, however, this is where depset
shines, as it’s a space and time effecient way of encoding a graph. The depset
API contains a to_list()
function making it easy to quickly iterate over the whole graph.
First, let’s define our unique provider. Providers are a way for us to attach additional metadata to every target in Bazel that they can carry along with them.
We will need two pieces of information: a fragment of text which are the immediate edges of this target and a depset which is the subgraph of targets it depends on.
GraphvizProviderInfo = provider(
doc = "A provider for graphviz",
fields = {
"fragment": "The edges of this target to it's strict dependencies",
"deps": "A depset of the dependencies of this target",
},
)
Let’s create our rule. We make it clear to Bazel that all targets provided to edges
must carry with them our provider GraphvizProviderInfo
.
Failure to add an edge which doesn’t have this provider, will be an evaluation error.
node = rule(
implementation = _node_impl,
attrs = {
"edges": attr.label_list(
doc = "Edges to other Graphviz nodes.",
providers = [GraphvizProviderInfo],
),
},
output_to_genfiles = True,
)
Now the implementation purpose is to construct each node’s fragment (i.e. direct edges) and also collect all fragment’s of each node reachable from the graph when constructing the final DOT graph.
Two key lines are when the rule constructs transitive_deps
and transitive_fragments
.
transitive_deps
- We need to construct and propagate the new
depset
for the given node. We pass the immediate edges as thedirect
dependencies and each direct dependencies owndepset
into thetransitive
attribute. This will create our graph bottom-up! transitive_fragments
- This is where the rule iterates over all reachable nodes in the graph. We could do a traditional traversal, but the appeal of
depset
is that it offers ato_list()
API that provides the traversal for us – while still giving us all the other space & time efficiencies.
def _node_impl(ctx):
# Generate the DOT fragment for the current node
fragment = '"{}"\n'.format(ctx.label)
fragment += ''.join(
['"{}" -> "{}"\n'.format(ctx.label, dep.label) for dep in ctx.attr.edges]
)
# Aggregate transitive dependencies using depset
transitive_deps = depset(
direct=ctx.attr.edges,
transitive=[dep[GraphvizProviderInfo].deps for dep in ctx.attr.edges]
)
# Concatenate all fragments from transitive dependencies
transitive_fragments = ''.join(
[dep[GraphvizProviderInfo].fragment for dep in transitive_deps.to_list()]
)
# Assemble the complete DOT content
dot_content = "digraph G {\n"
dot_content += fragment
dot_content += transitive_fragments
dot_content += "}\n"
# Declare and write the DOT file
dot_file = ctx.actions.declare_file(ctx.attr.name + ".dot")
ctx.actions.write(dot_file, dot_content)
# Return the providers
return [
DefaultInfo(files=depset([dot_file])),
GraphvizProviderInfo(fragment=fragment, deps=transitive_deps),
]
Let’s try our new rule using the targets earlier!
> bazel build //:A
Target //:A up-to-date:
bazel-bin/A.dot
> cat bazel-bin/A.dot
digraph G {
"@@//:A"
"@@//:A" -> "@@//:B"
"@@//:A" -> "@@//:D"
"@@//:C"
"@@//:D"
"@@//:B"
"@@//:B" -> "@@//:C"
"@@//:B" -> "@@//:D"
}
Huzzah! We built a small declarative graph ruleset that emits DOT files 🙌🏽.
We did so by eleveraging Bazel depset
to make the traversal efficient and propagated this information using our own custom provider
.
That was not as scary as I thought 🫣.
Update
Some feedback was provided by Peter Lobsinger over the Bazel slack that highlighted best practices from Bazel recommend trying to avoid calling to_list
whenever possible.
You can coerce a depset to a flat list using to_list(), but doing so usually results in O(N^2) cost. If at all possible, avoid any flattening of depsets except for debugging purposes. [ref]
Finally, it’s important to not retrieve the contents of the depset unnecessarily in rule implementations. One call to to_list() at the end in a binary rule is fine, since the overall cost is just O(n). It’s when many non-terminal targets try to call to_list() that quadratic behavior occurs. [ref]
We can update the rule to instead bubble up the fragment which includes the transitive edges.
The relevant change avoids calling to_list
and instead concatenates prior fragments into the current one.
def _node_impl(ctx):
# Generate the DOT fragment for the current node
fragment = '"{}"\n'.format(ctx.label)
fragment += ''.join(
['"{}" -> "{}"\n'.format(ctx.label, dep.label) for dep in ctx.attr.edges]
)
fragment += ''.join(
[dep[GraphvizProviderInfo].fragment for dep in ctx.attr.edges]
)
# Assemble the complete DOT content
dot_content = "digraph G {\n"
dot_content += fragment
dot_content += "}\n"
The downside to this approach is that nodes and edges may be duplicated in the resulting file with the current implementation. The DOT language supports duplicates, so the resulting graph is still correct albeit a bit unecessarily larger.
> bazel build //:A
Target //:A up-to-date:
bazel-bin/A.dot
> cat bazel-bin/A.dot
digraph G {
"@@//:A"
"@@//:A" -> "@@//:B"
"@@//:A" -> "@@//:D"
"@@//:B"
"@@//:B" -> "@@//:C"
"@@//:B" -> "@@//:D"
"@@//:C"
"@@//:D"
"@@//:D"
}
We could handle the duplicates in the fragment each time by stripping them out or create a new rule graph
which is the only point at which we do the full traversal and may call to_list
. However, I wanted to keep the rule as simple as possible for demonstrative purposes 🙇🏼.
Improve this page @ 961b1fc
The content for this site is
CC-BY-SA.