Run the full agreement test suite:
print("--- Agreement ---\n", .{});
check_both("3 + 5", 8);
check_both("2 + 3 * 4", 14);
check_both("(2 + 3) * 4", 20);
check_both("100 - 58", 42);
check_both("7 * 6", 42);
check_both("5 > 3", 1);
check_both("2 > 7", 0);
check_both("42 == 42", 1);
check_both("var x: i64 = 42; x", 42);
check_both("var x: i64 = 5; x = x + 1; x", 6);
check_both("var a: i64 = 10; var b: i64 = 20; a + b", 30);
check_both("if (5 > 3) { 10 } else { 20 }", 10);
check_both("if (2 > 7) { 10 } else { 20 }", 20);
check_both(
\\var s: i64 = 0; var i: i64 = 1;
\\while (i < 6) {
\\ s = s + i;
\\ i = i + 1;
\\}
\\s
, 15);
check_both("fn add(a: i64, b: i64) i64 { return a + b; } add(3, 5)", 8);
check_both("fn double(x: i64) i64 { return x * 2; } double(21)", 42);
check_both(
\\fn fib(n: i64) i64 {
\\ var a: i64 = 0;
\\ var b: i64 = 1;
\\ var i: i64 = 0;
\\ while (i < n) {
\\ var t: i64 = a + b;
\\ a = b;
\\ b = t;
\\ i = i + 1;
\\ }
\\ return b;
\\}
\\fib(10)
, 89);
Eighteen tests. Arithmetic, precedence, parentheses, comparisons, variables, assignment, multiple variables, if/else both branches, while loops, function calls, and Fibonacci.
If every line says ok: two completely independent engines -- the interpreter computing directly from source, and the compiler emitting WAT that a VM executes -- agree on every answer. The compiler is correct.
This is not a proof in the mathematical sense. It's a proof in the engineering sense -- the same sense a test suite is a proof. You've tested every feature, every branch of the compiler, every instruction the VM handles. The surface area is fully covered. If you added a feature (say, modulo) and only the compiler handled it wrong, one of these tests would catch it.