Macros

Frequently when writing grammars we encounter repetitive constructs that we would like to copy-and-paste. A common example is defining something like a "comma-separated list". Imagine we wanted to parse a comma-separated list of expressions (with an optional trailing comma, of course). If we had to write this out in full, it would look something like:

Exprs: Vec<Box<Expr>> = {
    Exprs "," Expr => ...,
    Expr => vec![<>],
}

Of course, this doesn't handle trailing commas, and I've omitted the action code. If we added those, it would get a bit more complicated. So far, this is fine, but then what happens when we later want a comma-separated list of terms? Do we just copy-and-paste everything?

LALRPOP offers a better option. You can define macros. In fact, LALRPOP comes with four macros builtin: *, ?, +, and (...). So you can write something like Expr? to mean "an optional Expr". This will have type Option<Box<Expr>> (since Expr alone has type Box<Expr>). Similarly, you can write Expr* or Expr+ to get a Vec<Expr> (with minimum length 0 and 1 respectively). The final macro is parentheses, which is a shorthand for creating a new nonterminal. This lets you write things like (<Expr> ",")? to mean an "optionally parse an Expr followed by a comma". Note the angle brackets around Expr: these ensures that the value of the (<Expr> ",") is the value of the expression, and not a tuple of the expression and the comma. This means that (<Expr> ",")? would have the type Option<Box<Expr>> (and not Option<(Box<Expr>, &'input str)>).

Using these operations we can define Exprs in terms of a macro Comma<T> that creates a comma-separated list of T, whatever T is (this definition appears in calculator5):

pub Exprs = Comma<Expr>; // (0)

Comma<T>: Vec<T> = { // (1)
    <mut v:(<T> ",")*> <e:T?> => match e { // (2)
        None => v,
        Some(e) => {
            v.push(e);
            v
        }
    }
};

The definition of Exprs on line (0) is fairly obvious, I think. It just uses a macro Comma<Expr>. Let's take a look then at the definition of Comma<T> on line (1). This is sort of dense, so let's unpack it. First, T is some terminal or nonterminal, but note that we can also use it as a type: when the macro is expanded, the T in the type will be replaced with "whatever the type of T is".

Next, on (2), we parse <mut v:(<T> ",")*> <e:T?>. That's a lot of symbols, so let's first remove all the angle brackets, which just serve to tell LALRPOP what values you want to propagate and which you want to discard. In that case, we have: (T ",")* T?. Hopefully you can see that this matches a comma-separated list with an optional trailing comma. Now let's add those angle-brackets back in. In the parentheses, we get (<T> ",")* -- this just means that we keep the value of the T but discard the value of the comma when we build our vector. Then we capture that vector and call it v: <mut v:(<T> ",")*>. The mut makes v mutable in the action code. Finally, we capture the optional trailing element e: <e:T?>. This means the Rust code has two variables available to it, v: Vec<T> and e: Option<T>. The action code itself should then be fairly clear -- if e is Some, it appends it to the vector and returns the result.

As another example of using macros, you may recall the precedence tiers we saw in calculator4 (Expr, Factor, etc), which had a sort of repetitive structure. You could factor that out using a macro. In this case, it's a recursive macro:

Tier<Op,NextTier>: Box<Expr> = {
    Tier<Op,NextTier> Op NextTier => Box::new(Expr::Op(<>)),
    NextTier
};

Expr = Tier<ExprOp, Factor>;
Factor = Tier<FactorOp, Term>;

ExprOp: Opcode = { // (3)
    "+" => Opcode::Add,
    "-" => Opcode::Sub,
};

FactorOp: Opcode = {
    "*" => Opcode::Mul,
    "/" => Opcode::Div,
};

And, of course, we have to add some tests to main.rs file:

#![allow(unused)]
fn main() {
use lalrpop_util::lalrpop_mod;

lalrpop_mod!(pub calculator5);

#[test]
fn calculator5() {
    let expr = calculator5::ExprsParser::new().parse("").unwrap();
    assert_eq!(&format!("{:?}", expr), "[]");

    let expr = calculator5::ExprsParser::new()
        .parse("22 * 44 + 66")
        .unwrap();
    assert_eq!(&format!("{:?}", expr), "[((22 * 44) + 66)]");

    let expr = calculator5::ExprsParser::new()
        .parse("22 * 44 + 66,")
        .unwrap();
    assert_eq!(&format!("{:?}", expr), "[((22 * 44) + 66)]");

    let expr = calculator5::ExprsParser::new()
        .parse("22 * 44 + 66, 13*3")
        .unwrap();
    assert_eq!(&format!("{:?}", expr), "[((22 * 44) + 66), (13 * 3)]");

    let expr = calculator5::ExprsParser::new()
        .parse("22 * 44 + 66, 13*3,")
        .unwrap();
    assert_eq!(&format!("{:?}", expr), "[((22 * 44) + 66), (13 * 3)]");
}
}