mbg / posts

Clang AST Matchers

One way to cope with lots of possible submitters to a project, each of own may have their own definitions of quality, is to enforce a quality bar with tooling. This means everything from running a linter to using static and runtime analysis to find dormant bugs.

Quick list of some completely reasonable things to use:

  • Linters: clang-format, flint, et al.
  • Static analysis: Coverity, scan-build, cppcheck
  • Runtime analysis: All of the sanitizers (ubsan, asan, tsan)

libtooling in llvm/clang provides an amazing library of functions to use when other tools don’t give you the checks that you want. The clang docs in general are high quality and the documentation on getting started with libtooling1 is very helpful for getting started with using the pre-built tools to find common problems.

libtooling also provides a handy list of helper functions so that you can write your own static analysis tool. There is a tutorial for set of abstract syntax tree (AST, what the compiler builds in order to understand your code and uses to determine how to turn it into a real program) matcher interface, mostly because I thought the DSL it defined was really neat.

The DSL lets you define some attibute of AST nodes you want to match on, where the attributes can be 1) the type of the node or 2) the type of any of the node’s ancestors or children in 3) any sort of combination you think is worth looking at. You can then define a callback action to perform when those nodes are found to either just log or automatically rewrite the found code.

Let’s say you wanted to build a tool to look for destructors which throw. You could write an AST matcher for a destructor with a throw statement like the following:

DeclarationMatcher ThrowDestructorMatcher =
    cxxDestructorDecl(
      hasDescendant(
        cxxThrowExpr().bind("throw")
      )).bind("destructor");

Quick explanation:

  • DeclarationMatcher means this is a Matcher<Decl> so the root node of the AST we’re going to get back will be of type Decl, which makes sense since we’re looking for cxxDestructorDecls.
  • hasDescendant belongs to a group of very neat generic matchers. We don’t need do any traversal on our own of children and subtrees, the DSL lets us specify the relationships we want.
  • The .bind() call lets us define an identifier which will map to the nodes and let us retrieve them later on in our MatchCallback (the action to perform on finding one of these).

That AST matcher isn’t generic enough though – what if the exception being thrown isn’t directly in the destructor, but instead in a function the destructor calls? We’d need to make it a little more generic:

auto Throws = hasDescendant(cxxThrowExpr().bind("throw"));
auto FunctionWhichThrows = functionDecl(Throws).bind("throw_function");
auto CallsToFunctionWhichThrows =
    hasDescendant(declRefExpr(to(FunctionWhichThrows)));
DeclarationMatcher ThrowDestructorMatcher =
    cxxDestructorDecl(anyOf(
      Throws,
      CallsToFunctionWhichThrows
    )).bind("destructor");

And we can pull out “CallsTo” as its own idea, and reuse it to detect doubly-nested calls to throws.

auto CallsTo =
    [](DeclarationMatcher dm) -> DeclarationMatcher {
      return hasDescendant(declRefExpr(to(dm)));
    };

auto CallsToCallsToFunctionWhichThrows = CallsTo(CallsToFunctionWhichThrows);

That’s so cool!

One limitation, which may not be obvious if you’re unaware of how C++ compilation works, is that the AST parsing facilities libtooling provides are scoped to a single compilation/translation-unit only. To oversimplify, the AST for a given function foo will not have access to the body of symbols which are not defined in the same place/file in which foo is defined. A trick to cope with this is making everything you’re interested in a single compilation unit similar to a precompiled header.

The AST Matchers are really cool and powerful. Playing with them made me feel like I had superpowers to go find bad patterns, whatever they may be and wherever they are. At the very least, looking for smelly code at $WORK is much simpler with this out there.


  1. The official documentation includes a pair of tutorials:

    and a very nice reference for the AST Matcher DSL. ↩︎