Writing Simple Parser in Python

From time to time one might need to write simple language parser to  implement  some domain specific language for his application.  As always python ecosystem offers  various solutions – overview of python parser generators is available here.  In this article I’d like to describe my experiences with parsimonious package. For recent project of mine ( imap_detach –  a tool to automatically download attachment from IMAP mailbox) I needed simple expressions to specify what emails and what exact parts should be downloaded.

Requirements

I needed a parser for simple logical expressions, which use a set of predefined variables ( properties of email), like these:

Meaning: Email part is jpg image and is added as attachment and have not been seen yet

Meaning all email part where filename contains .pdf and is from jack or jim

Grammar

Parsimonious implements PEG grammar which enables to create very compact grammar definitions. Unlike some other parser generators, where grammar in expressed in Python, parsimonious  has it’s own syntax, which enables to create short and easy to overview grammar definitions:

There are  couple of things which has to be remembered, when creating grammar:

  • PEG grammar should avoid left recursion, so rules like

    Don not work and will result in recursion error ( infinite recursion).  They have to be rewritten as indirect left recursion, which might be sometime challenging.
  • PEG grammar grammar are more deterministic that context free grammars –  so there is always one rule that is matched – there is no ambiguity like in  CFG.   This is assured by deterministic selection  / – always first match is selected and by greedy repetition operators * + ?  (they always match maximum possible length from input).
    Practically this means that rules has to be more carefully designed and they need to be design in particular way to assure required priority of operators.
  • Only named rules can have some special treatment when walking AST tree ( see below).  I think this is special feature of parsimonious,  but AST contains  nodes for a part of rule – like expressions that are in brackets.   This means if I have rule like this:

    I have no control on evaluating the right part, so I rather split it into two rules.

Evaluation

Once we have the grammar, we can evaluate expressions by walking the parsed AST. Parsimonious provides nice support for this with visitor pattern . We can create subclass of parsimonious.NodeVisitor with visit_rulename methods for all (relevant) rules from the grammar.   visit method receives current node in AST and list of values from its already visited (evaluated ) children.

So here is example of NodeVisitor class, that evaluates our simple expressions:

This class evaluates expression within context of defined variable’s values (dictionary). So we can parse and evaluate expression with one method:

 Conclusion

Parsimonious is a nice compact package for creating small parsers that are easy to use. For myself I bit struggled with PEG grammar,  but that’s probably due to my unfamiliarity with this type of grammars.  Once accustomed to it one can create more complex grammars.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">