Using tokens with references

When using a custom lexer, you might want tokens to hold references to the original input. This allows to use references to the input when the grammar can have arbitrary symbols such as variable names. Using references instead of copying the symbols can improve performance and memory usage of the parser.

The Lexer

We can now create a new calculator parser that can deal with symbols the same way an interpreter would deal with variables. First we need the corresponding AST :


#![allow(unused)]
fn main() {
pub enum ExprSymbol<'input>{
    NumSymbol(&'input str),
    Op(Box<ExprSymbol<'input>>, Opcode, Box<ExprSymbol<'input>>),
    Error,
}
}

Then, we need to build the tokens:


#![allow(unused)]
fn main() {
#[derive(Copy, Clone, Debug)]
pub enum Tok<'input> {
    NumSymbol(&'input str),
    FactorOp(Opcode),
    ExprOp(Opcode),
    ParenOpen,
    ParenClose,
}
}

Notice the NumSymbol type holding a reference to the original input. It represents both numbers and variable names as a slice of the original input.

Then, we can build the lexer itself.


#![allow(unused)]
fn main() {
use std::str::CharIndices;

pub struct Lexer<'input> {
    chars: std::iter::Peekable<CharIndices<'input>>,
    input: &'input str,
}

impl<'input> Lexer<'input> {
    pub fn new(input: &'input str) -> Self {
        Lexer {
            chars: input.char_indices().peekable(),
            input,
        }
    }
}
}

It needs to hold a reference to the input to put slices in the tokens.


#![allow(unused)]
fn main() {
impl<'input> Iterator for Lexer<'input> {
    type Item = Spanned<Tok<'input>, usize, ()>;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match self.chars.next() {
                Some((_, ' '))  | Some((_, '\n')) | Some((_, '\t')) => continue,
                Some((i, ')')) => return Some(Ok((i, Tok::ParenClose, i + 1))),
                Some((i, '(')) => return Some(Ok((i, Tok::ParenOpen, i + 1))),
                Some((i, '+')) => return Some(Ok((i, Tok::ExprOp(Opcode::Add), i + 1))),
                Some((i, '-')) => return Some(Ok((i, Tok::ExprOp(Opcode::Sub), i + 1))),
                Some((i, '*')) => return Some(Ok((i, Tok::FactorOp(Opcode::Mul), i + 1))),
                Some((i, '/')) => return Some(Ok((i, Tok::FactorOp(Opcode::Div), i + 1))),

                None => return None, // End of file
                Some((i,_)) => {
                    loop {
                        match self.chars.peek() {
                            Some((j, ')'))|Some((j, '('))|Some((j, '+'))|Some((j, '-'))|Some((j, '*'))|Some((j, '/'))|Some((j,' '))
                            => return Some(Ok((i, Tok::NumSymbol(&self.input[i..*j]), *j))),
                            None => return Some(Ok((i, Tok::NumSymbol(&self.input[i..]),self.input.len()))),
                            _ => {self.chars.next();},
                        }
                    }
                }
            }
        }
    }
}
}

It's quite simple, it returns any operator, and if it detects any other character, stores the beginning then continues to the next operator and sends the symbol it just parsed.

The parser

We can then take a look at the corresponding parser with a new grammar:


#![allow(unused)]
fn main() {
Term: Box<ExprSymbol<'input>> = {
    "num" => Box::new(ExprSymbol::NumSymbol(<>)),
    "(" <Expr> ")"
};
}

We need to pass the input to the parser so that the input's lifetime is known to the borrow checker when compiling the generated parser.


#![allow(unused)]
fn main() {
grammar<'input>(input: &'input str);
}

Then we just need to define the tokens the same as before :


#![allow(unused)]
fn main() {
extern {
    type Location = usize;
    type Error = ();
    
    enum Tok<'input> {
        "num" => Tok::NumSymbol(<&'input str>),
        "FactorOp" => Tok::FactorOp(<Opcode>),
        "ExprOp" => Tok::ExprOp(<Opcode>),
        "(" => Tok::ParenOpen,
        ")" => Tok::ParenClose,
    }
}
}

Calling the parser

We can finally run the parser we built:


#![allow(unused)]
fn main() {
let input = "22 * pi + 66";
let lexer = Lexer::new(input);
let expr = calculator9::ExprParser::new()
    .parse(input,lexer)
    .unwrap();
assert_eq!(&format!("{:?}", expr), "((\"22\" * \"pi\") + \"66\")");
}