Bazel Knowledge: A practical guide to depset

Published 2025-05-20 on Farid Zakaria's Blog

Bazel’s depset is a powerful construct for managing transitive dependencies efficiently. While it’s commonly used in complex rules and providers, sometimes a simple use case can illuminate its utility.

Ok… but what does that meah? 🤔

This is a companion post to Jay Conrad excellent series on writing rules. The series is excellent and I recommend you read it 🤓.

Consider this simple guide for understanding depset. We will be writing a simple ruleset rules_graphviz.

Graphviz is an open source graph visualization software. Graphs are defined via the DOT language which is a grammar for defining Graphviz nodes, edges, graphs.

For instance, let’s take the simple graph G below.

digraph G {
 "A" -> "B"
 "A" -> "D"
 "B" -> "C"
 "B" -> "D"
}

Would produce the following visualization.

┌───┐     ┌───┐
│ D │ ◀── │ A │
└───┘     └───┘
  ▲         │
  │         │
  │         ▼
  │       ┌───┐
  └────── │ B │
          └───┘
            │
            │
            ▼
          ┌───┐
          │ C │
          └───┘

Our goal would be to write a Bazel rule that let’s us model this graph purely in Bazel targets.

load("@rules_graphviz//:dot.bzl", "node")

node(
  name = "A",
  edges = [
    ":B",
    ":D",
  ],
)

node(
  name = "B",
  edges = [
    ":C",
    ":D",
  ],
)

node(name = "C")

node(name = "D")

We would like a rule that creates a text file (dot file) of the digraph representation of all the nodes reachable from the given target.

That means every node should know it’s dependencies (i.e. edges), and we’d like a way to traverse the whole graph.

💡 We could do this traversal with a standard algorithm (i.e. breadth-first-search) knowing only the direct edges, however, this is where depset shines, as it’s a space and time effecient way of encoding a graph. The depset API contains a to_list() function making it easy to quickly iterate over the whole graph.

First, let’s define our unique provider. Providers are a way for us to attach additional metadata to every target in Bazel that they can carry along with them.

We will need two pieces of information: a fragment of text which are the immediate edges of this target and a depset which is the subgraph of targets it depends on.


GraphvizProviderInfo = provider(
  doc = "A provider for graphviz",
  fields = {
    "fragment": "The edges of this target to it's strict dependencies",
    "deps": "A depset of the dependencies of this target",
  },
)

Let’s create our rule. We make it clear to Bazel that all targets provided to edges must carry with them our provider GraphvizProviderInfo. Failure to add an edge which doesn’t have this provider, will be an evaluation error.

node = rule(
  implementation = _node_impl,
  attrs = {
    "edges": attr.label_list(
      doc = "Edges to other Graphviz nodes.",
      providers = [GraphvizProviderInfo],
    ),
  },
  output_to_genfiles = True,
)

Now the implementation purpose is to construct each node’s fragment (i.e. direct edges) and also collect all fragment’s of each node reachable from the graph when constructing the final DOT graph.

Two key lines are when the rule constructs transitive_deps and transitive_fragments.

transitive_deps
We need to construct and propagate the new depset for the given node. We pass the immediate edges as the direct dependencies and each direct dependencies own depset into the transitive attribute. This will create our graph bottom-up!
transitive_fragments
This is where the rule iterates over all reachable nodes in the graph. We could do a traditional traversal, but the appeal of depset is that it offers a to_list() API that provides the traversal for us – while still giving us all the other space & time efficiencies.
def _node_impl(ctx):
  # Generate the DOT fragment for the current node
  fragment = '"{}"\n'.format(ctx.label)
  fragment += ''.join(
    ['"{}" -> "{}"\n'.format(ctx.label, dep.label) for dep in ctx.attr.edges]
  )

  # Aggregate transitive dependencies using depset
  transitive_deps = depset(
    direct=ctx.attr.edges,
    transitive=[dep[GraphvizProviderInfo].deps for dep in ctx.attr.edges]
  )

  # Concatenate all fragments from transitive dependencies
  transitive_fragments = ''.join(
    [dep[GraphvizProviderInfo].fragment for dep in transitive_deps.to_list()]
  )

  # Assemble the complete DOT content
  dot_content = "digraph G {\n"
  dot_content += fragment
  dot_content += transitive_fragments
  dot_content += "}\n"

  # Declare and write the DOT file
  dot_file = ctx.actions.declare_file(ctx.attr.name + ".dot")
  ctx.actions.write(dot_file, dot_content)

  # Return the providers
  return [
    DefaultInfo(files=depset([dot_file])),
    GraphvizProviderInfo(fragment=fragment, deps=transitive_deps),
  ]

Let’s try our new rule using the targets earlier!

> bazel build //:A
Target //:A up-to-date:
  bazel-bin/A.dot

> cat bazel-bin/A.dot
digraph G {
"@@//:A"
"@@//:A" -> "@@//:B"
"@@//:A" -> "@@//:D"
"@@//:C"
"@@//:D"
"@@//:B"
"@@//:B" -> "@@//:C"
"@@//:B" -> "@@//:D"
}

Huzzah! We built a small declarative graph ruleset that emits DOT files 🙌🏽.

We did so by eleveraging Bazel depset to make the traversal efficient and propagated this information using our own custom provider.

That was not as scary as I thought 🫣.

Update

Some feedback was provided by Peter Lobsinger over the Bazel slack that highlighted best practices from Bazel recommend trying to avoid calling to_list whenever possible.

You can coerce a depset to a flat list using to_list(), but doing so usually results in O(N^2) cost. If at all possible, avoid any flattening of depsets except for debugging purposes. [ref]

Finally, it’s important to not retrieve the contents of the depset unnecessarily in rule implementations. One call to to_list() at the end in a binary rule is fine, since the overall cost is just O(n). It’s when many non-terminal targets try to call to_list() that quadratic behavior occurs. [ref]

We can update the rule to instead bubble up the fragment which includes the transitive edges.

The relevant change avoids calling to_list and instead concatenates prior fragments into the current one.

def _node_impl(ctx):
  # Generate the DOT fragment for the current node
  fragment = '"{}"\n'.format(ctx.label)
  fragment += ''.join(
    ['"{}" -> "{}"\n'.format(ctx.label, dep.label) for dep in ctx.attr.edges]
  )

  fragment += ''.join(
    [dep[GraphvizProviderInfo].fragment for dep in ctx.attr.edges]
  )

  # Assemble the complete DOT content
  dot_content = "digraph G {\n"
  dot_content += fragment
  dot_content += "}\n"

The downside to this approach is that nodes and edges may be duplicated in the resulting file with the current implementation. The DOT language supports duplicates, so the resulting graph is still correct albeit a bit unecessarily larger.

> bazel build //:A
Target //:A up-to-date:
  bazel-bin/A.dot

> cat bazel-bin/A.dot
digraph G {
"@@//:A"
"@@//:A" -> "@@//:B"
"@@//:A" -> "@@//:D"
"@@//:B"
"@@//:B" -> "@@//:C"
"@@//:B" -> "@@//:D"
"@@//:C"
"@@//:D"
"@@//:D"
}

We could handle the duplicates in the fragment each time by stripping them out or create a new rule graph which is the only point at which we do the full traversal and may call to_list. However, I wanted to keep the rule as simple as possible for demonstrative purposes 🙇🏼.


Improve this page @ 961b1fc
The content for this site is CC-BY-SA.