Easy Bytecode
Posted by Kaya Kupferschmidt • Wednesday, February 28. 2007 • Category: Programming
Writing a interpreter for a custom scripting language always seems to be more complex than writing a small bytecode compiler and bytecode interpreter. At a first glance, writing a direct interpreter might be easier, but if the scripting language contains flow control (like loops, if/else statements and similar constructs involving jumps), this turns out to be false. The primary problem lies in the fact that one needs to duplicate large parts of the parser - simply for skipping over the parts of a script that are not executed (like with a conditional if-block whose conditioon turns out to be false at runtime).
Because of this insight, I began to concentrate on writing an easy-to-implement bytecode compiler that would transform a text-based script into a more machine-friendly representation. A positive side-effect of bytecode is that simple fact that it is much faster to execute than the original textual representation. The downside of this approach is the fact that it involves writing a compiler - something that sounds to be a complex and difficult task.
But after analysing the process of parsing a script, I came to the conclusion that such a compiler and its corresponding bytecode interpreter ("virtual machine") would be rather straight-forward and easy to implement, if the underlying model of the virtual machine is chosen carefully. In my opinion the best machine model (in terms of simplicity to implement a compiler and interpreter) it a purely stack-based RPN (Reverse Polish notation) machine. And such a model not only is easy to implement, but it also easy to extract the original syntax tree from the bytecode, which in turn allows further optimisation techniques as a postprocessing step (it even wouldn't be to hard to turn bytecode into native assembler).
Because of this insight, I began to concentrate on writing an easy-to-implement bytecode compiler that would transform a text-based script into a more machine-friendly representation. A positive side-effect of bytecode is that simple fact that it is much faster to execute than the original textual representation. The downside of this approach is the fact that it involves writing a compiler - something that sounds to be a complex and difficult task.
But after analysing the process of parsing a script, I came to the conclusion that such a compiler and its corresponding bytecode interpreter ("virtual machine") would be rather straight-forward and easy to implement, if the underlying model of the virtual machine is chosen carefully. In my opinion the best machine model (in terms of simplicity to implement a compiler and interpreter) it a purely stack-based RPN (Reverse Polish notation) machine. And such a model not only is easy to implement, but it also easy to extract the original syntax tree from the bytecode, which in turn allows further optimisation techniques as a postprocessing step (it even wouldn't be to hard to turn bytecode into native assembler).


