savo.la
2019-03-17
Sneaky Go interface conversion
As I was profiling my WebAssembly
compiler, I noticed that the
runtime.convI2I
function was taking up a large portion of time.
The compiler's design prioritizes fast compilation over complex optimizations,
and the hotspot was indeed in the code which reads input - the one part that
cannot really be avoided.
But what is this convI2I
? When a Go program has an assignment or
function call where an interface value is converted to another one, the
official Go toolchain generates code which converts it during runtime by
calling
convI2I
:
LEAQ type.io.ByteReader(SB), AX // Target *interfacetype. MOVQ AX, (SP) MOVQ "".i+56(SP), AX // Source interface MOVQ "".i+64(SP), CX // value (16 bytes). MOVQ AX, 8(SP) MOVQ CX, 16(SP) CALL runtime.convI2I(SB) MOVQ 24(SP), AX MOVQ 32(SP), CX MOVQ AX, "".~r1+72(SP) // Target interface MOVQ CX, "".~r1+80(SP) // value (16 bytes).
Problematic code
The source of the WebAssembly module is abstracted using a reader interface. It's one of the very few areas where an interface type is actively used by the compiler. I suppose interface conversions usually happen when some variable or struct field is initialized at the start of a routine, and that interface value is then used in tight spots without conversion. But this was different.
The input reader is accessed through loader type L which provides some helper methods. L wraps interface R which has a method set designed to be compatible with many buffered reader implementations. Some helper methods then go on to call standard library functions which take a narrower interface. As the compiler reads input a little at a time in one tight loop after another, a helper method may be doing an interface conversion at every iteration.
Example code highlighting the issue:
import "encoding/binary" type R interface { Read([]byte) (int, error) ReadByte() (byte, error) UnreadByte() error } type L struct{ R } // Varuint64 reads a WebAssembly varuint64 or panics. func (loader L) Varuint64() uint64 { // Varuint happens to match encoding/binary's uvarint. // ReadUvarint takes an io.ByteReader which has only // the ReadByte method, so R needs to be converted. value, err := binary.ReadUvarint(loader.R) if err != nil { panic(err) } return value }
Optimization
I went on to
reimplement the problematic methods
so that they no longer make function calls which require interface conversions.
convI2I
and the related runtime functions completely disappeared
from pprof output, and my benchmark got a lot faster:
benchmark old ns/op new ns/op delta BenchmarkLoad001/Init 103382 46667 -54.86% BenchmarkLoad001/Code 35512149 30302264 -14.67% BenchmarkLoad002/Init 22052 12125 -45.02% BenchmarkLoad002/Code 6231188 5482958 -12.01% benchmark old MB/s new MB/s speedup BenchmarkLoad001/Init 43.96 97.39 2.22x BenchmarkLoad001/Code 26.80 31.41 1.17x BenchmarkLoad002/Init 48.61 88.41 1.82x BenchmarkLoad002/Code 26.28 29.87 1.14x
Init tests deserialize WebAssembly metadata structures which is really heavy on the problematic loader methods. Code tests compile bytecode. (Data section loading is omitted as uninteresting.)
BenchmarkLoad001 compiles a small Rust program built in debug mode, weighing 7.1 MB. BenchmarkLoad002 uses the same program built in release mode, which makes it 1.6 MB. The binaries are available in the wag-bench repository.The CPU profile and benchmark were recreated for this article using Go 1.12.1.
Alternative implementation of Go interfaces
TinyGo does things differently: it doesn't do anything during interface conversion; instead, there is additional overhead during interface method calls.
I'm left to wonder if it would be possible to do away with dynamic interface construction without affecting method calls. If a Go compiler would do full program analysis at link time, it could perhaps construct more efficient lookup tables. But it would also slow down builds and cause problems with plugins - and probably not be worthwhile.
—I suppose interface conversion overhead is not a very common issue, but this code pattern might be something to look for when optimizing.