Easy Bytecode

Posted by Kaya Kupferschmidt • Wednesday, February 28. 2007 • Category: Programming
Writing a interpreter for a custom scripting language always seems to be more complex than writing a small bytecode compiler and bytecode interpreter. At a first glance, writing a direct interpreter might be easier, but if the scripting language contains flow control (like loops, if/else statements and similar constructs involving jumps), this turns out to be false. The primary problem lies in the fact that one needs to duplicate large parts of the parser - simply for skipping over the parts of a script that are not executed (like with a conditional if-block whose conditioon turns out to be false at runtime).

Because of this insight, I began to concentrate on writing an easy-to-implement bytecode compiler that would transform a text-based script into a more machine-friendly representation. A positive side-effect of bytecode is that simple fact that it is much faster to execute than the original textual representation. The downside of this approach is the fact that it involves writing a compiler - something that sounds to be a complex and difficult task.

But after analysing the process of parsing a script, I came to the conclusion that such a compiler and its corresponding bytecode interpreter ("virtual machine") would be rather straight-forward and easy to implement, if the underlying model of the virtual machine is chosen carefully. In my opinion the best machine model (in terms of simplicity to implement a compiler and interpreter) it a purely stack-based RPN (Reverse Polish notation) machine. And such a model not only is easy to implement, but it also easy to extract the original syntax tree from the bytecode, which in turn allows further optimisation techniques as a postprocessing step (it even wouldn't be to hard to turn bytecode into native assembler).

Continue reading "Easy Bytecode"

Reflection for C++

Posted by Kaya Kupferschmidt • Monday, February 26. 2007 • Category: C++
One hot topic I am currently busy in, is reflection for C++. Reflection for a computer language means that you can access all types toegther with their methods and members at runtime using a simple string-based interface. Such a feature especially simplifies binding scripting-languages to a program by providing one wrapper that operates on the reflection information instead of binding each class, method or function by hand. Other possible usages are remote-method-invocations, serialisation, XML based configurations etc.

C++ offers only some very basic runtime type information (RTTI) out of the box and lacks full reflection. There are some projects on the net that try to close this gap (most notably the Reflex framework), but none of them really seem to be as powerful, flexible and easy-to-use as their native counterparts in Java or C#.

Dimajix's framework Magnum soon will contain some new modules that try to fill this gap, by offering the following tools:

  • A generic Meta-Compiler based upon gccxml together with a specialised XML-based transformation language.

  • Non-intrusive, full reflection for all public elements of any C++ program given in source.

  • A Java-like scripting language built on top of the reflection together with a custom bytecode compiler.


With these tools, one can easily add generic scripting capabilities to any C++ program with only some little one-time effort. Plus it is possible to use the Metacompiler for automatically transforming any type-information given in C++ headers in any kind of text-based files, by providing a set of transformation rules which will be applied to the C++ metainformation generated with gccxml.

The new package is not completely finished yet, there are some features still missing, but work is steadily progressing. For a preview, you can simply check out the latest version of Magnum using Subversion at svn://subversion.dimajix.de/magnum.