Timo Savola

2011-01-28

Concrete Python

Concrete is an experiment on portable execution state; to make it possible to save the state of a running program at arbitrary point of execution without help from the client code, and restore it—possibly on a different kind of system. I thought that it's less frustrating to develop a custom runtime environment instead of patching an existing one, and it gives greater flexibility to play with the execution model (which is, after all, the whole point). So I decided to implement a yet another Python virtual machine.

The VM takes in a CPython 3.1 bytecode file and executes it so that the program's memory image is in a coherent state after every opcode: the image could theoretically be backed by a memory-mapped file, which could be loaded to a new VM instance. Objects' contents are kept in little-endian byte order. Pointers are 32 bits wide.

The implementation language is C++. (Haters gonna hate.) With it we get types with automatic byte order conversion on big-endian architectures and implicit reference counting of Python objects. I've also played with C++0x type inference for kicks. Oh, and the license is LGPL.

For now it supports just about enough opcodes to be able to import a built-in module, define functions, call them and add integers together. The heap allocator is really dumb. But it does it all in the Concrete way.

Further thinking

Since having to swap bytes on every object access (and reference/dereference!) on big-endian CPUs strikes as incredibly slow, another approach could be to mark the byte order used in the saved memory image and convert the objects while loading. Coupled with garbage collection, heap compaction etc. this could be convenient. Then again, with garbage collection the reference counting would be eliminated... Yet another approach could be to convert objects on demand, but that would require conditional jumps—even on the vastly more popular little-endian CPUs. But that could come as a bonus if the heap allocator implemented copy-on-write snapshots, which in turn could enable interesting new language constructs...

The plan is not to support standard Python modules (except pure-Python ones which don't depend on native ones). Instead, the idea is to use custom I/O abstractions which are applicable to the transient scenarios.