The VM
agreement-suite

Run the full agreement test suite:

    print("--- Agreement ---\n", .{});
    check_both("3 + 5", 8);
    check_both("2 + 3 * 4", 14);
    check_both("(2 + 3) * 4", 20);
    check_both("100 - 58", 42);
    check_both("7 * 6", 42);
    check_both("5 > 3", 1);
    check_both("2 > 7", 0);
    check_both("42 == 42", 1);
    check_both("var x: i64 = 42; x", 42);
    check_both("var x: i64 = 5; x = x + 1; x", 6);
    check_both("var a: i64 = 10; var b: i64 = 20; a + b", 30);
    check_both("if (5 > 3) { 10 } else { 20 }", 10);
    check_both("if (2 > 7) { 10 } else { 20 }", 20);
    check_both(
        \\var s: i64 = 0; var i: i64 = 1;
        \\while (i < 6) {
        \\    s = s + i;
        \\    i = i + 1;
        \\}
        \\s
    , 15);
    check_both("fn add(a: i64, b: i64) i64 { return a + b; } add(3, 5)", 8);
    check_both("fn double(x: i64) i64 { return x * 2; } double(21)", 42);
    check_both(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0;
        \\    var b: i64 = 1;
        \\    var i: i64 = 0;
        \\    while (i < n) {
        \\        var t: i64 = a + b;
        \\        a = b;
        \\        b = t;
        \\        i = i + 1;
        \\    }
        \\    return b;
        \\}
        \\fib(10)
    , 89);

Eighteen tests. Arithmetic, precedence, parentheses, comparisons, variables, assignment, multiple variables, if/else both branches, while loops, function calls, and Fibonacci.

If every line says ok: two completely independent engines -- the interpreter computing directly from source, and the compiler emitting WAT that a VM executes -- agree on every answer. The compiler is correct.

This is not a proof in the mathematical sense. It's a proof in the engineering sense -- the same sense a test suite is a proof. You've tested every feature, every branch of the compiler, every instruction the VM handles. The surface area is fully covered. If you added a feature (say, modulo) and only the compiler handled it wrong, one of these tests would catch it.