Fast Incremental PEG Parsing
Zachary Yedidia and Stephen Chong
Proceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering (SLE), October 2021.

(Best Research Paper Award)

Abstract.

Incremental parsing is an integral part of code analysis performed by text editors and integrated development environments. This paper presents new methods to significantly improve the efficiency of incremental parsing for Parsing Expression Grammars (PEGs). We build on \textitIncremental Packrat Parsing, an algorithm that adapts packrat parsing to an incremental setting, by implementing the memoization table as an interval tree with special support for shifting intervals, and modifying the memoization strategy to create tree structures in the table. Our approach enables reparsing in time logarithmic in the size of the input for typical edits, compared with linear-time reparsing for Incremental Packrat Parsing. We implement our methods in a prototype called GPeg, a parsing machine for PEGs with support for dynamic parsers (an important feature for extensibility in editors). Experiments show that GPeg has strong performance (sub-5ms reparse times) across a variety of input sizes (tens to hundreds of megabytes) and grammar types (from full language grammars to minimal grammars), and compares well with existing incremental parsers. As a complete example, we implement a syntax highlighting library and prototype editor using GPeg, with optimizations for these applications.