Handling full expressions

Now we are ready to extend our calculator to cover the full range of arithmetic expressions (well, at least the ones you learned in elementary school). Here is the next calculator example, calculator3:

use std::str::FromStr;

grammar;

pub Expr: i32 = {
    <l:Expr> "+" <r:Factor> => l + r,
    <l:Expr> "-" <r:Factor> => l - r,
    Factor,
};

Factor: i32 = {
    <l:Factor> "*" <r:Term> => l * r,
    <l:Factor> "/" <r:Term> => l / r,
    Term,
};

Term: i32 = {
    Num,
    "(" <Expr> ")",
};

Num: i32 = {
    r"[0-9]+" => i32::from_str(<>).unwrap(),
};

Perhaps the most interesting thing about this example is the way it encodes precedence. The idea of precedence of course is that in an expression like 2+3*4, we want to do the multiplication first, and then the addition. It is pretty straightforward to express precedence in your grammar by structuring it in tiers -- for example, here we have the nonterminal Expr, which covers all expressions. It consists of a series of factors that are added or subtracted from one another. A Factor is then a series of terms that are multiplied or divided. Finally, a Term is either a single number or, using parenthesis, an entire expr.

Abstracting from this example, the typical pattern for encoding precedence is to have one nonterminal per precedence level, where you begin with the operators of lowest precedence (+, -), add in the next highest precedence level (*, /), and finish with the bare "atomic" expressions like Num. Finally, you add in a parenthesized version of your top-level as an atomic expression, which lets people reset.

To see why this works, consider the two possible parse trees for something like 2+3*4:

2 + 3   *    4          2   +  3   *    4
| | |   |    |          |   |  |   |    |
| | +-Factor-+    OR    +-Expr-+   |    |
| |     |                   |      |    |
+-Expr -+                   +----Factor-+

In the first one, we give multiplication higher precedence, and in the second one, we (incorrectly) give addition higher precedence. If you look at the grammar now, you can see that the second one is impossible: a Factor cannot have an Expr as its left-hand side. This is the purpose of the tiers: to force the parser into the precedence you want.

Tiered expressions can also be generated by Lalrpop using the precedence and assoc attribute macros. These macros generate the same thing as tiered expressions do, but they can reduce code complexity when working with many levels of precedence. The above Expr grammar can be rewritten to the following using them:

pub Expr: i32 = {
    #[precedence(level="0")] // Highest precedence
    Term,
    #[precedence(level="1")] #[assoc(side="left")]
    <l:Expr> "*" <r:Expr> => l * r,
    <l:Expr> "/" <r:Expr> => l / r,
    #[precedence(level="2")] #[assoc(side="left")]
    <l:Expr> "+" <r:Expr> => l + r,
    <l:Expr> "-" <r:Expr> => l - r,
};

The precedence level specifies the order of operations starting from zero. In this example it means that 13 + 7 * 3 would be evaluated as (13 + (7 * 3)) because multiplication has a lower precedence level than addition.

By using assoc you can specify if the expression is left-associative or right-associative. This is required to make the grammar unambiguous, otherwise 1 + 2 + 3 could both be interpreted as (1 + (2 + 3)) and ((1 + 2) + 3).

Finally, note that we only write pub before the nonterminal we're interested in parsing (Expr) and not any of the helpers. Nonterminals marked pub have extra code generated, like the new() method used to access the parser from other modules. If you get a warning about an unused new() method on FooParser, drop the pub from nonterminal Foo.