savo.la

2019-03-17

Sneaky Go interface conversion

As I was profiling my WebAssembly compiler, I noticed that the runtime.convI2I function was taking up a large portion of time. The compiler's design prioritizes fast compilation over complex optimizations, and the hotspot was indeed in the code which reads input - the one part that cannot really be avoided.

But what is this convI2I? When a Go program has an assignment or function call where an interface value is converted to another one, the official Go toolchain generates code which converts it during runtime by calling convI2I:

    LEAQ    type.io.ByteReader(SB), AX  // Target *interfacetype.
    MOVQ    AX, (SP)
    MOVQ    "".i+56(SP), AX             // Source interface
    MOVQ    "".i+64(SP), CX             // value (16 bytes).
    MOVQ    AX, 8(SP)
    MOVQ    CX, 16(SP)
    CALL    runtime.convI2I(SB)
    MOVQ    24(SP), AX
    MOVQ    32(SP), CX
    MOVQ    AX, "".~r1+72(SP)           // Target interface
    MOVQ    CX, "".~r1+80(SP)           // value (16 bytes).

Problematic code

The source of the WebAssembly module is abstracted using a reader interface. It's one of the very few areas where an interface type is actively used by the compiler. I suppose interface conversions usually happen when some variable or struct field is initialized at the start of a routine, and that interface value is then used in tight spots without conversion. But this was different.

The input reader is accessed through loader type L which provides some helper methods. L wraps interface R which has a method set designed to be compatible with many buffered reader implementations. Some helper methods then go on to call standard library functions which take a narrower interface. As the compiler reads input a little at a time in one tight loop after another, a helper method may be doing an interface conversion at every iteration.

Example code highlighting the issue:

import "encoding/binary"

type R interface {
    Read([]byte) (int, error)
    ReadByte() (byte, error)
    UnreadByte() error
}

type L struct{ R }

// Varuint64 reads a WebAssembly varuint64 or panics.
func (loader L) Varuint64() uint64 {
    // Varuint happens to match encoding/binary's uvarint.
    // ReadUvarint takes an io.ByteReader which has only
    // the ReadByte method, so R needs to be converted.
    value, err := binary.ReadUvarint(loader.R)
    if err != nil {
        panic(err)
    }
    return value
}

Optimization

I went on to reimplement the problematic methods so that they no longer make function calls which require interface conversions. convI2I and the related runtime functions completely disappeared from pprof output, and my benchmark got a lot faster:

benchmark                 old ns/op     new ns/op     delta
BenchmarkLoad001/Init     103382        46667         -54.86%
BenchmarkLoad001/Code     35512149      30302264      -14.67%
BenchmarkLoad002/Init     22052         12125         -45.02%
BenchmarkLoad002/Code     6231188       5482958       -12.01%

benchmark                 old MB/s     new MB/s     speedup
BenchmarkLoad001/Init     43.96        97.39        2.22x
BenchmarkLoad001/Code     26.80        31.41        1.17x
BenchmarkLoad002/Init     48.61        88.41        1.82x
BenchmarkLoad002/Code     26.28        29.87        1.14x

Init tests deserialize WebAssembly metadata structures which is really heavy on the problematic loader methods. Code tests compile bytecode. (Data section loading is omitted as uninteresting.)

BenchmarkLoad001 compiles a small Rust program built in debug mode, weighing 7.1 MB. BenchmarkLoad002 uses the same program built in release mode, which makes it 1.6 MB. The binaries are available in the wag-bench repository.

The CPU profile and benchmark were recreated for this article using Go 1.12.1.

Alternative implementation of Go interfaces

TinyGo does things differently: it doesn't do anything during interface conversion; instead, there is additional overhead during interface method calls.

I'm left to wonder if it would be possible to do away with dynamic interface construction without affecting method calls. If a Go compiler would do full program analysis at link time, it could perhaps construct more efficient lookup tables. But it would also slow down builds and cause problems with plugins - and probably not be worthwhile.

I suppose interface conversion overhead is not a very common issue, but this code pattern might be something to look for when optimizing.

Post comments at Twitter or somewhere.

Timo Savola