minizig — full book

!UNDER CONSTRUCTION! Nothing works yet!

This book is a collection of problems. You solve them, one by one, and by the end you'll have built a programming language from scratch -- interpreter, compiler, virtual machine, the whole thing. The compiler will work on its own source code. A self-compiling compiler is a fantastic party trick that will impress your friends.

Minizig is a strict subset of Zig -- every program we write in minizig also compiles with the normal zig.exe. We stick to the smallest subset of Zig we can get away with, because eventually we have to write a compiler for it in itself, and we are lazy.

Minizig targets WebAssembly -- a bytecode that runs in browsers, servers, and standalone runtimes. Most books show you how to interpret or compile. We do both, and then we build a virtual machine for the bytecode ourselves, because why not.

About me:

When I was in school, at some point I had to choose between computer graphics and compilers. I chose computer graphics, and now I am a game programmer. But compilers intrigue me, and I always wanted to write one. I spend way too much time lurking in compiler communities (hello ziggit!).

About AI:

I wanted to read a list of small questions/answers that would result in writing a self-hosted compiler for a subset of Zig, and such a thing did not exist. So I asked the AI, and we wrote that together. I have not read any book about compilers, so this is a very strange experience trying to write one. In my defense, I would never have tried writing a book without some serious help, and I would be looking at youtube shorts instead.

Note:

Type the code by hand. Resist the urge to copy-paste or ask an AI to write it for you. (I know, if the thing is written by AI, why should I type it manually. It is a strange paradox). Your goal is to learn how to write a compiler, not to have a new bad compiler... The hands do the learning, not the eyes.

Anyway, the best way to read this AI written book is to be as human as you can, and try irrational detours as you progress through the steps. Have fun!

Temporary chapter name for all chapters

the-language

You don't have to know Zig to follow along. I'll tell you what you need. You should have touched Python or something similar before.

Go to ziglang.org and download the latest stable release. Extract it, put the folder in your PATH.

zig version

If you get a version number, you're good.

Create a folder for your project, cd into it, make a file minizig.zig:

const std = @import("std");
const print = std.debug.print;

pub fn main() void {
    print("hello, world\n", .{});
}

Compile:

zig build-exe minizig.zig

This produces minizig.exe on Windows, minizig elsewhere. Run it:

./minizig.exe

You should see hello, world. That's it. You have a working Zig setup.

Now try something. In main, declare two values:

const some_constant_number: i64 = 34;
var some_variable_number: i64 = 35;

First, try to modify the const:

some_constant_number = 99;

Compile. What does Zig say?

Then remove that line and try the other direction: leave the var alone. Don't assign anything to it after its declaration. Compile. What does Zig say now?

Finally, print both values using print. The second argument to print is a tuple of values, written .{ a, b }. Placeholders: {d} for an integer.

const std = @import("std");
const print = std.debug.print;

pub fn main() void {
    print("hello, world\n", .{});

    const some_constant_number: i64 = 34;
    // some_constant_number = 99;  // error: cannot assign to constant

    var some_variable_number: i64 = 35;
    some_variable_number = 42;  // a var that never changes is an error too.

    // print's second arg is a tuple of values: .{ a, b, c }.
    // {d} for a decimal integer, {s} for a string, {c} for a char.
    print("const = {d}, var = {d}\n", .{ some_constant_number, some_variable_number });
}

Some people don't like being forced to modify their variables, but I like it. The ZLS extension transforms var into const and conversely according to what you do with it, so you don't have to think about it. It's all about reading being more important than writing.

When I see a var, I have to reserve mental room for whatever might happen to it later. If a var never actually gets reassigned, I've wasted that room, which annoys me.

the-debugger

If you already know how to debug code in your editor of choice, skip this.

We're not doing printf debugging in 2026. We hover the mouse over a variable and see its current value. We press F10 repeatedly and walk through the code as it runs. The debugger is the single best tool for understanding what your program is actually doing.

Install VS Code with three extensions: Zig Language, ZLS (Zig Language Server -- autocomplete and error highlighting), and CodeLLDB (breakpoints).

Create a .vscode folder in your project with this launch.json:

{
    "version": "0.2.0",
    "configurations": [{
        "type": "lldb",
        "request": "launch",
        "name": "Debug minizig",
        "program": "${workspaceFolder}/minizig.exe",
        "preLaunchTask": "build"
    }]
}

And this tasks.json:

{
    "version": "2.0.0",
    "tasks": [{
        "label": "build",
        "type": "shell",
        "command": "zig build-exe minizig.zig -lc"
    }]
}

(The -lc we'll explain in the next problem.)

Set a breakpoint, press F5, watch the program freeze. Hover over a variable to see its value.

getting-started

We're not going to use Zig's standard library. No @import("std"), because self-hosting is about eating our own code. When we make our job easier, we make our job harder.

But then, how do we print? We grab one function from libc -- putchar, which writes a single byte -- and build everything else on it.

Replace your minizig.zig with this:

extern "c" fn putchar(c: c_int) c_int;

fn printChar(c: u8) void {
    _ = putchar(@intCast(c));
}

fn printString(s: []const u8) void {
    var i: usize = 0;
    while (i < s.len) {
        printChar(s[i]);
        i += 1;
    }
}

pub fn main() void {
    printString("hello, world\n");
}

Build with -lc so putchar resolves against libc:

zig build-exe minizig.zig -lc

Same hello, world output, no std.

We'll need three numeric types through the book:

- u8: unsigned 8-bit integer. Range 0 to 255. One byte. Mostly for individual characters.
- i64: signed 64-bit integer. Range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Our default for any number.
- usize: an unsigned integer big enough to hold any array index or length on your machine (usually 64 bits). Use it for positions into a string, array lengths, counts. If the distinction from u64 ever matters, it doesn't in this book.

Your turn. Write printNumber(n: i64) void. It should print 42 for 42, 0 for 0, -123 for -123. No recursion.

Hint: collect digits low-to-high into a small buffer, then print the buffer in reverse. Special-case zero (otherwise the loop never runs). Handle negatives by printing '-' then flipping the sign.

@mod(value, 10) gives the last digit as an i64. @divTrunc(value, 10) drops the last digit. @intCast converts between integer types (e.g. i64 to u8).

Test it:

pub fn main() void {
    printNumber(42);    printChar('\n');
    printNumber(0);     printChar('\n');
    printNumber(-123);  printChar('\n');
    printNumber(999999);printChar('\n');
}

Use what you set up in the last problem. Put a breakpoint on the first printNumber(42) call. Press F5 -- the program freezes there. Press F11 to step into printNumber. You're now inside the function, with n showing 42. Press F10 to step over each line; hover over value, i, buf to watch them change. When you hit return, F10 pops you back out to main. Try printNumber(1234) and watch the buffer fill in reverse order, one digit per F10 press. That's the debugger doing what print can't.

fn printNumber(n: i64) void {
    var value: i64 = n;
    if (value < 0) {
        printChar('-');
        value = -value;
    }
    if (value == 0) {
        printChar('0');
        return;
    }
    var buf: [20]u8 = undefined;
    var i: usize = 0;
    while (value > 0) {
        const d: i64 = @mod(value, 10);
        buf[i] = @intCast('0' + d);
        i += 1;
        value = @divTrunc(value, 10);
    }
    while (i > 0) {
        i -= 1;
        printChar(buf[i]);
    }
}

Buffer is [20]u8 because i64 maxes out at 19 digits; plus room. undefined is fine -- we only read slots we wrote.

interpret-3-plus-4

Read three characters, compute the sum, print it.

'3' - '0' is 3 because ASCII digits are contiguous. But Zig never converts between types implicitly, and input[0] is a u8 -- subtracting one u8 from another would panic if it ever went below zero. We widen to i64 first: @as(i64, input[0]) - '0'.

For now, the input has no spaces. "3+4", not "3 + 4". Whitespace handling comes much later.

Fill in the three // YOU lines.

pub fn main() void {
    const input: [:0]const u8 = "3+4";

    // YOU: const a: i64 = ...
    // YOU: const b: i64 = ...
    // YOU: const result: i64 = ...

    printString("interpreter result for ");
    printString(input);
    printString(" : ");
    printNumber(result);
    printChar('\n');
}

const a: i64 = @as(i64, input[0]) - '0';
const b: i64 = @as(i64, input[2]) - '0';
const result: i64 = a + b;

We're ignoring input[1]. Hardcoded + for now.

In Zig, strings are arrays of bytes. The :0 in [:0]const u8 says "ends with a zero byte" -- a sentinel we'll use as our end-of-input marker later. Don't sweat the exact punctuation; just remember the strings end with a zero.

walk-wat-lines

A compiler needs to write machine instructions. When faced with the problem of adding 3 and 4, a machine writes 3 at some specific place (a register), then 4 to some other place, then adds them, and puts the result somewhere.

Dealing with that many steps just for 3+4 is annoying, and this is why we want higher level programming languages (but not too high -- we want to know what the machine is doing).

Assembly languages such as x86_64 are full of weird register and instruction names to memorize.

There is also WASM, which is a "binary instruction format for a stack-based virtual machine". You put the 2 values on the stack, and then one add instruction pops the 2 numbers, adds them, and puts the result back on the stack.

A stack is just a pile -- things go on top, come off the top. To compute 3 + 4:

                                  []
push 3:                           [3]
push 4:                           [3, 4]
add (pops 4, pops 3, pushes 7):   [7]

WASM is functionally equivalent to other assembly languages, so virtual machines can be fast on any hardware. WASM is binary, but there is WebAssembly text, which is the readable version.

wat woman meme

We'll compile our 3+4 to three lines of WebAssembly text:

i64.const 3
i64.const 4
i64.add

Each line is one instruction. Before we run them, let's just walk through the string and slice it up by \n. Print each line back to confirm it works.

pub fn main() void {
    // (interpreter section unchanged)

    const wat: [:0]const u8 =
        \\i64.const 3
        \\i64.const 4
        \\i64.add
    ;

    var wat_pos: usize = 0;
    while (wat_pos < wat.len) {
        var wat_line_end: usize = wat_pos;

        // YOU: advance wat_line_end until '\n' or end of wat.
        // YOU: print the slice wat[wat_pos..wat_line_end], then a newline.

        wat_pos = wat_line_end + 1;
    }
}

while (wat_line_end < wat.len and wat[wat_line_end] != '\n') {
    wat_line_end += 1;
}
printString(wat[wat_pos..wat_line_end]);
printChar('\n');

wat[wat_pos..wat_line_end] is a slice -- a view into the multi-line string. No copying.

vm-runs-const

Time to actually run the WAT. The stack machine has one rule: i64.const N pushes N on the stack. Let's get that one working first; i64.add comes next.

To dispatch, we'll compare each line to a known instruction name. Two helpers, both at file scope:

fn stringsEqual(
    a: []const u8,
    b: []const u8,
) bool {
    if (a.len != b.len) {
        return false;
    }
    var i: usize = 0;
    while (i < a.len) {
        if (a[i] != b[i]) {
            return false;
        }
        i += 1;
    }
    return true;
}

fn stringStartsWith(
    s: []const u8,
    prefix: []const u8,
) bool {
    if (s.len < prefix.len) {
        return false;
    }
    return stringsEqual(s[0..prefix.len], prefix);
}

stringsEqual for full matches like "i64.add". stringStartsWith for "i64.const " -- the prefix is fixed but the digit after isn't.

In main, walk the WAT and dispatch on each line. For now the WAT only has consts. If a line starts with "i64.const ", parse the last char as a digit and push. Anything else is an error.

pub fn main() void {
    // (interpreter section unchanged)

    const wat: [:0]const u8 =
        \\i64.const 3
        \\i64.const 4
    ;

    var stack: [16]i64 = undefined;
    var stack_pointer: usize = 0;

    var wat_pos: usize = 0;
    while (wat_pos < wat.len) {
        var wat_line_end: usize = wat_pos;
        while (wat_line_end < wat.len and wat[wat_line_end] != '\n') {
            wat_line_end += 1;
        }
        const line: []const u8 = wat[wat_pos..wat_line_end];

        // YOU: dispatch on line.
        //   If it starts with "i64.const ", parse line[line.len - 1] as a digit, push.
        //   Otherwise, printString("error: unknown instruction\n").

        wat_pos = wat_line_end + 1;
    }

    printString("top of stack: ");
    printNumber(stack[stack_pointer - 1]);
    printChar('\n');
}

if (stringStartsWith(line, "i64.const ")) {
    stack[stack_pointer] = @as(i64, line[line.len - 1]) - '0';
    stack_pointer += 1;
} else {
    printString("error: unknown instruction\n");
    return;
}

After two i64.const instructions, the stack holds 3, 4. The top is 4. We print it to confirm the loop pushed both values.

vm-runs-add

Add the second instruction: i64.add. It pops two values and pushes their sum.

Extend the WAT to include i64.add, then add an else if branch to the dispatch.

const wat: [:0]const u8 =
    \\i64.const 3
    \\i64.const 4
    \\i64.add
;

// ... line walker unchanged ...

if (stringStartsWith(line, "i64.const ")) {
    stack[stack_pointer] = @as(i64, line[line.len - 1]) - '0';
    stack_pointer += 1;
}
// YOU: else if line equals "i64.add", pop y, pop x, push x + y.
else {
    printString("error: unknown instruction\n");
    return;
}

Then change the final print to "vm result: " and print stack[0] -- after the add, the stack should hold one value, the answer.

} else if (stringsEqual(line, "i64.add")) {
    const y: i64 = stack[stack_pointer - 1];
    const x: i64 = stack[stack_pointer - 2];
    stack_pointer -= 2;
    stack[stack_pointer] = x + y;
    stack_pointer += 1;
}

y was pushed last, so it sits on top. Doesn't matter for +. Will matter when subtraction shows up.

vm result: 7. Same answer as the interpreter, computed by a stack machine.

compile-to-wat

Let's take a little detour to recap Zig's syntax around arrays and pointers:

- [25]u8 is an array. It is a value and it lives on the stack, just like any value.
- *u8 is a pointer to a u8.
- [*]u8 is a pointer to a u8 that is part of an array (even though the syntax is weird, this is the most normal one).
- []u8 is a combination of a [*]u8 and a size: it is a slice, a view into an array, a fat pointer.
- [:0]u8 is a slice that ends with a zero byte (length AND sentinel).
- [*:0]u8 is like [*]u8 but the array it points into ends with a zero byte. Walk until you see the zero.
- & in front of something gives you a pointer to it (or something that is some sort of pointer such as a slice).
- .* after a pointer is the opposite: it dereferences, which means "follow this pointer".

When you write "hello", Zig puts the constant bytes h e l l o 0 in the "data section of your executable" and hands you a [:0]const u8 so you can access it.

If you are feeling overwhelmed, don't worry too much. You will learn to love Zig's explicitness around pointers; it is one of its best features.

Back to WAT. We now want to generate it instead of writing it by hand. Two helpers do the heavy lifting.

The helpers below take &wat_buffer and &wat_length so they can write into our buffer and bump the length. Inside, position.* reads or writes the value the pointer points at -- the .* "unwraps" the pointer.

fn appendChar(
    buffer: [*]u8,
    position: *usize,
    c: u8,
) void {
    buffer[position.*] = c;
    position.* += 1;
}

fn appendStringLiteral(
    buffer: [*]u8,
    position: *usize,
    s: [*:0]const u8,
) void {
    var i: usize = 0;
    while (s[i] != 0) {
        appendChar(buffer, position, s[i]);
        i += 1;
    }
}

appendStringLiteral walks s until it sees the zero -- that's why s is [*:0]const u8. Every Zig string literal has that zero at the end, including the multi-line \\ ones.

We keep both paths in main and pick between them with write_wat_manually. When true, paste the WAT in by hand; when false, the compiler emits it. Either way the buffer ends up the same. Same vm result.

Fill in the else branch.

pub fn main() void {
    const input: [:0]const u8 = "3+4";

    const a: i64 = @as(i64, input[0]) - '0';
    const b: i64 = @as(i64, input[2]) - '0';
    const result: i64 = a + b;

    printString("interpreter result for ");
    printString(input);
    printString(" : ");
    printNumber(result);
    printChar('\n');

    var wat_buffer: [256]u8 = undefined;
    var wat_length: usize = 0;

    const write_wat_manually: bool = false;
    if (write_wat_manually) {
        appendStringLiteral(&wat_buffer, &wat_length,
            \\i64.const 3
            \\i64.const 4
            \\i64.add
        );
    } else {

        // YOU: emit "i64.const X\n" for input[0],
        //      "i64.const Y\n" for input[2], "i64.add\n".
    
    }

    var stack: [16]i64 = undefined;
    var stack_pointer: usize = 0;

    var wat_pos: usize = 0;
    while (wat_pos < wat_length) {
        var wat_line_end: usize = wat_pos;
        while (wat_line_end < wat_length and wat_buffer[wat_line_end] != '\n') {
            wat_line_end += 1;
        }
        const line: []const u8 = wat_buffer[wat_pos..wat_line_end];

        if (stringStartsWith(line, "i64.const ")) {
            stack[stack_pointer] = @as(i64, line[line.len - 1]) - '0';
            stack_pointer += 1;
        } else if (stringsEqual(line, "i64.add")) {
            const y: i64 = stack[stack_pointer - 1];
            const x: i64 = stack[stack_pointer - 2];
            stack_pointer -= 2;
            stack[stack_pointer] = x + y;
            stack_pointer += 1;
        } else {
            printString("error: unknown instruction\n");
            return;
        }

        wat_pos = wat_line_end + 1;
    }

    printString("vm result: ");
    printNumber(stack[0]);
    printChar('\n');
}

appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
appendChar(&wat_buffer, &wat_length, input[0]);
appendChar(&wat_buffer, &wat_length, '\n');
appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
appendChar(&wat_buffer, &wat_length, input[2]);
appendChar(&wat_buffer, &wat_length, '\n');
appendStringLiteral(&wat_buffer, &wat_length, "i64.add\n");

We don't bother converting the digit char to a number and back -- input[0] is already a valid WAT digit. Append it as a byte.

Flip write_wat_manually to true and back. Same 7 either way.

chain-interpret

Now "3+4+5" should give 12. Read the first digit, then loop: skip the +, read the next digit, add it.

Same shape will turn up everywhere.

pub fn main() void {
    const input: [:0]const u8 = "3+4+5";
    var input_pos: usize = 0;

    var result: i64 = @as(i64, input[input_pos]) - '0';
    input_pos += 1;

    // YOU: while input_pos < input.len:
    //        skip the '+', read the next digit, add it to result.

    printString("interpreter result for ");
    printString(input);
    printString(" : ");
    printNumber(result);
    printChar('\n');

    // (the WAT-toggle / VM section from compile-to-wat is still below.
    //  the hand-written WAT and the compiler still only know "3+4",
    //  so the vm result will be wrong until we fix them.)
}

while (input_pos < input.len) {
    input_pos += 1;
    const next_digit: i64 = @as(i64, input[input_pos]) - '0';
    input_pos += 1;
    result = result + next_digit;
}

The input_pos += 1 at the top skips the +. Then we read the next digit and bump past it.

interpreter result for 3+4+5 : 12. The vm result: line will still say 7 until the next two problems catch up.

chain-vm-by-hand

The VM walks every line -- it has no idea how many instructions there are. Verify that by extending the hand-written WAT to 3+4+5.

Don't change a single line of the VM. Just extend the multi-line string in the if (write_wat_manually) branch.

if (write_wat_manually) {
    appendStringLiteral(&wat_buffer, &wat_length,
        \\i64.const 3
        \\i64.const 4
        \\i64.add
    );
    // YOU: extend the WAT above so it computes 3+4+5.
} else {
    // (compiler -- still only handles "3+4" for now)
}

Set write_wat_manually = true and run.

appendStringLiteral(&wat_buffer, &wat_length,
    \\i64.const 3
    \\i64.const 4
    \\i64.add
    \\i64.const 5
    \\i64.add
);

vm result: 12. The VM was always general.

chain-compile

Compiler version of chain-interpret. Same loop shape, but the body emits text.

We restart input_pos at 0 because the interpreter already walked it to the end. After the first const-emit, loop: skip the +, emit the next const, emit add.

} else {
    input_pos = 0;
    appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
    appendChar(&wat_buffer, &wat_length, input[input_pos]);
    appendChar(&wat_buffer, &wat_length, '\n');
    input_pos += 1;

    // YOU: while input_pos < input.len:
    //        skip the '+', emit "i64.const X\n" for the next digit, emit "i64.add\n".
}

Set write_wat_manually = false and run.

while (input_pos < input.len) {
    input_pos += 1;
    appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
    appendChar(&wat_buffer, &wat_length, input[input_pos]);
    appendChar(&wat_buffer, &wat_length, '\n');
    appendStringLiteral(&wat_buffer, &wat_length, "i64.add\n");
    input_pos += 1;
}

vm result: 12. Three engines, one answer. Try flipping write_wat_manually -- both paths produce the same WAT.

four-ops-interpret

Now handle all four operators: +, -, *, /. Try "3+4*5" (left-to-right, no precedence yet -- so 35, not 23). Try "5-3", "4*7", "8/2".

For division, use @divTrunc(a, b) -- Zig wants you to be explicit about how to round.

In the interpreter loop, read the operator first, then dispatch.

while (input_pos < input.len) {
    const op: u8 = input[input_pos];
    input_pos += 1;
    const next_digit: i64 = @as(i64, input[input_pos]) - '0';
    input_pos += 1;

    // YOU: dispatch on op for '+', '-', '*', '/'.
    //      for anything else, printString("error").
}

if (op == '+') {
    result = result + next_digit;
} else if (op == '-') {
    result = result - next_digit;
} else if (op == '*') {
    result = result * next_digit;
} else if (op == '/') {
    result = @divTrunc(result, next_digit);
} else {
    printString("error: unknown operator\n");
    return;
}

9-3-1 is 5 (left-to-right: (9-3)-1). 8/2/2 is 2. Precedence -- where * and / outrank + and - -- comes much later.

four-ops-vm

The VM only knows i64.add. Extend the dispatch with i64.sub, i64.mul, and i64.div_s. Each is its own else if branch with the same pop-pop-compute-push shape.

To test, set write_wat_manually = true and try this WAT:

i64.const 8
i64.const 2
i64.div_s

Then try i64.sub (with 5 then 3) and i64.mul.

if (stringStartsWith(line, "i64.const ")) {
    stack[stack_pointer] = @as(i64, line[line.len - 1]) - '0';
    stack_pointer += 1;
} else if (stringsEqual(line, "i64.add")) {
    const y: i64 = stack[stack_pointer - 1];
    const x: i64 = stack[stack_pointer - 2];
    stack_pointer -= 2;
    stack[stack_pointer] = x + y;
    stack_pointer += 1;
}
// YOU: else if "i64.sub" -> pop, pop, push x - y
// YOU: else if "i64.mul" -> pop, pop, push x * y
// YOU: else if "i64.div_s" -> pop, pop, push @divTrunc(x, y)
else {
    printString("error: unknown instruction\n");
    return;
}

} else if (stringsEqual(line, "i64.sub")) {
    const y: i64 = stack[stack_pointer - 1];
    const x: i64 = stack[stack_pointer - 2];
    stack_pointer -= 2;
    stack[stack_pointer] = x - y;
    stack_pointer += 1;
} else if (stringsEqual(line, "i64.mul")) {
    const y: i64 = stack[stack_pointer - 1];
    const x: i64 = stack[stack_pointer - 2];
    stack_pointer -= 2;
    stack[stack_pointer] = x * y;
    stack_pointer += 1;
} else if (stringsEqual(line, "i64.div_s")) {
    const y: i64 = stack[stack_pointer - 1];
    const x: i64 = stack[stack_pointer - 2];
    stack_pointer -= 2;
    stack[stack_pointer] = @divTrunc(x, y);
    stack_pointer += 1;
}

Order matters now. Walk through 5 - 3:

1. Source order is 5, then -, then 3.
2. Push 5. Stack: [5].
3. Push 3. Stack: [5, 3] -- 3 is on top.
4. Pop into y: gets 3 (top). Pop into x: gets 5.
5. Compute x - y = 5 - 3 = 2. Push. Stack: [2].

The order in the source flows straight to the order on the stack. That's why we name y first -- it's the one on top, the one pushed last, the right-hand operand. Same shape for i64.div_s: x / y where y is the divisor. Get this wrong once and 8 / 2 becomes 2 / 8 = 0.

The repetition is real -- four branches that almost do the same thing. Hold off the urge to factor it. Once we're solid on stack semantics, a refactor is cheap.

four-ops-compile

The compiler emits the right WAT op-name based on the operator.

In the compiler's chain loop, after emitting the next const, dispatch on op.

while (input_pos < input.len) {
    const op: u8 = input[input_pos];
    input_pos += 1;
    appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
    appendChar(&wat_buffer, &wat_length, input[input_pos]);
    appendChar(&wat_buffer, &wat_length, '\n');

    // YOU: emit "i64.add\n", "i64.sub\n", "i64.mul\n", or "i64.div_s\n".
    //      for anything else, printString("error").

    input_pos += 1;
}

if (op == '+') {
    appendStringLiteral(&wat_buffer, &wat_length, "i64.add\n");
} else if (op == '-') {
    appendStringLiteral(&wat_buffer, &wat_length, "i64.sub\n");
} else if (op == '*') {
    appendStringLiteral(&wat_buffer, &wat_length, "i64.mul\n");
} else if (op == '/') {
    appendStringLiteral(&wat_buffer, &wat_length, "i64.div_s\n");
} else {
    printString("error: unknown operator\n");
    return;
}

Try "3+4*5" -> compiles to const 3, const 4, add, const 5, mul. VM runs it: 35. Same as the interpreter. Try "9-3-1", "8/2", your own.

We can only really test single-digit chains: the digit-extraction logic in the interpreter, the compiler's appendChar(input[input_pos]), and the VM's wat_buffer[wat_line_end - 1] all assume one-byte numbers. Multi-digit numbers come next.

parse-number

We've gone pretty far with our interpreter/compiler/vm setup, but we still can't handle 12+3. Let's fix that.

The line we want to replace is var result: i64 = @as(i64, input[input_pos]) - '0'; -- we need to eat the whole number, not just one digit. Wrap it in a helper. Single-digit for now; next problem you'll make it loop.

fn parseNumber(
    buffer: []const u8,
    position: *usize,
) i64 {
    const value: i64 = @as(i64, buffer[position.*]) - '0';
    position.* += 1;
    return value;
}

In the interpreter, call parseNumber(input, &input_pos) in place of the two inline digit reads. The chain-loop condition is input[input_pos] != 0 -- input is [:0]const u8, so reading at input.len returns the sentinel 0 instead of going out of bounds.

Single-digit tests still pass.

var result: i64 = parseNumber(input, &input_pos);

while (input[input_pos] != 0) {
    const op: u8 = input[input_pos];
    input_pos += 1;
    const next_number: i64 = parseNumber(input, &input_pos);

    if (op == '+') {
        result = result + next_number;
    } else if (op == '-') {
        result = result - next_number;
    } else if (op == '*') {
        result = result * next_number;
    } else if (op == '/') {
        result = @divTrunc(result, next_number);
    } else {
        printString("error: unknown operator\n");
        return;
    }
}

Renamed next_digit -> next_number since one problem from now it won't be a single digit.

multi-digit-interpret

Now grow parseNumber to actually parse multi-digit numbers, not just one.

A small helper to keep the digit check readable -- we'll reach for it more than once:

fn isDigit(c: u8) bool {
    return c >= '0' and c <= '9';
}

Hint: loop while position is in range AND the current char isDigit, doing value = value * 10 + digit each step. To parse 123: 1*10+2=12, then 12*10+3=123. Stop on anything that isn't a digit. The bounds check in the loop condition stops us from running off the end of buffer -- needed because parseNumber will be called with non-sentinel slices later (the WAT lines in the VM).

const input: [:0]const u8 = "12+3";    // 15
const input: [:0]const u8 = "100+23";  // 123
const input: [:0]const u8 = "12*12";   // 144

fn parseNumber(
    buffer: []const u8,
    position: *usize,
) i64 {
    var value: i64 = 0;
    while (position.* < buffer.len and isDigit(buffer[position.*])) {
        value = value * 10 + (@as(i64, buffer[position.*]) - '0');
        position.* += 1;
    }
    return value;
}

The interpreter call sites didn't change. That's the helper paying off.

The compiler and the VM still live in single-digit land -- they're next.

multi-digit-vm

The VM still grabs only the last byte of an i64.const line. Now that the line is i64.const 12 (or i64.const 12345), we need to parse the number after the "i64.const " prefix.

We already have parseNumber. It works on any []const u8 plus a position pointer. The number starts right after the "i64.const " prefix, so the start position is "i64.const ".len.

Replace this:

stack[stack_pointer] = @as(i64, line[line.len - 1]) - '0';

with a parseNumber call.

The other VM branches (i64.add, i64.sub, etc.) don't change at all -- they pop and push i64s and never touched the source bytes.

if (stringStartsWith(line, "i64.const ")) {
    var num_pos: usize = "i64.const ".len;
    stack[stack_pointer] = parseNumber(line, &num_pos);
    stack_pointer += 1;
}

parseNumber carrying its weight: the same helper we use to parse from the source input also parses from the WAT line. line is a plain []const u8 with no sentinel, which is exactly why parseNumber's loop condition has the position.* < buffer.len check.

multi-digit-compiler

The compiler still emits one byte per number:

appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
appendChar(&wat_buffer, &wat_length, input[input_pos]);
appendChar(&wat_buffer, &wat_length, '\n');
input_pos += 1;

Cleanest path: parse the number with parseNumber (gives us an i64), then write the digits back into the wat buffer with a new helper. Same shape as printNumber, but the output sink is appendChar(buffer, position, c) instead of printChar(c):

fn appendNumber(
    buffer: [*]u8,
    position: *usize,
    n: i64,
) void {
    var value: i64 = n;
    if (value < 0) {
        appendChar(buffer, position, '-');
        value = -value;
    }
    if (value == 0) {
        appendChar(buffer, position, '0');
        return;
    }
    var buf: [20]u8 = undefined;
    var i: usize = 0;
    while (value > 0) {
        const d: i64 = @mod(value, 10);
        buf[i] = @intCast('0' + d);
        i += 1;
        value = @divTrunc(value, 10);
    }
    while (i > 0) {
        i -= 1;
        appendChar(buffer, position, buf[i]);
    }
}

Now in the compiler, replace the byte-copy with parseNumber + appendNumber. Same fix in two places: the initial number, and the next number inside the chain loop.

// 2. THE COMPILER
var wat_buffer: [512]u8 = undefined;
var wat_length: usize = 0;

input_pos = 0;
appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
appendNumber(&wat_buffer, &wat_length, parseNumber(input, &input_pos));
appendChar(&wat_buffer, &wat_length, '\n');

while (input[input_pos] != 0) {
    const op: u8 = input[input_pos];
    input_pos += 1;

    appendStringLiteral(&wat_buffer, &wat_length, "i64.const ");
    appendNumber(&wat_buffer, &wat_length, parseNumber(input, &input_pos));
    appendChar(&wat_buffer, &wat_length, '\n');

    if (op == '+') {
        appendStringLiteral(&wat_buffer, &wat_length, "i64.add\n");
    } else if (op == '-') {
        appendStringLiteral(&wat_buffer, &wat_length, "i64.sub\n");
    } else if (op == '*') {
        appendStringLiteral(&wat_buffer, &wat_length, "i64.mul\n");
    } else if (op == '/') {
        appendStringLiteral(&wat_buffer, &wat_length, "i64.div_s\n");
    } else {
        printString("error: unknown operator\n");
        return;
    }
}

The compiler reads from input via parseNumber, writes to wat_buffer via appendNumber. One line per number on each side. The bytes of input and the bytes of wat_buffer never touch -- everything goes through i64. Try 12+3, 100+23, 12*12 and watch the VM agree with the interpreter.

skip-whitespace

Skip whitespace. Add two helpers: isSpace tells us whether a byte is whitespace, and skipSpaces advances past a run of them.

Treat any of these four bytes as whitespace: space ' ', tab '\t', newline '\n', carriage return '\r'.

Strategy: the cursor should always sit on a non-whitespace character when we're about to read something. Call skipSpaces(source, &pos) once at the start of eval, and after every consume so the next reader doesn't have to think about spaces.

testCase("  3  +  5  ", 8);
testCase("100 + 23", 123);

fn isSpace(c: u8) bool {
    return c == ' ' or c == '\n' or c == '\t' or c == '\r';
}

fn skipSpaces(
    source: []const u8,
    pos: *usize,
) void {
    var current_char: u8 = curChar(source, pos.*);
    while (isSpace(current_char)) {
        pos.* += 1;
        current_char = curChar(source, pos.*);
    }
}

curChar handles end-of-input -- past source.len it returns 0, and isSpace(0) is false, so the loop stops on its own even if the input ends with whitespace.

Three places to add skipSpaces(source, &pos);:

1. At the start of eval, before parsing begins.
2. At the end of readNumber, before returning.
3. After the pos += 1 that consumes the operator character. Easy to forget -- the operator is just one byte, so there's no readOperator helper that would skip for us.

Updated readNumber:

fn readNumber(
    source: []const u8,
    pos: *usize,
) i64 {
    var value: i64 = 0;
    var current_char: u8 = curChar(source, pos.*);
    while (isDigit(current_char)) {
        value = value * 10 + digitValue(current_char);
        pos.* += 1;
        current_char = curChar(source, pos.*);
    }
    skipSpaces(source, pos);
    return value;
}

Updated eval:

fn eval(source: []const u8) i64 {
    var pos: usize = 0;
    skipSpaces(source, &pos);

    var value: i64 = readNumber(source, &pos);

    var current_char: u8 = curChar(source, pos);
    while (current_char != 0) {
        const op: u8 = current_char;
        pos += 1;
        skipSpaces(source, &pos);
        const right: i64 = readNumber(source, &pos);

        value = switch (op) {
            '+' => value + right,
            '-' => value - right,
            '*' => value * right,
            '/' => @divTrunc(value, right),
            else => value,
        };
        current_char = curChar(source, pos);
    }

    return value;
}

parens

Parentheses. 2*(3+4) should be 14. Whatever sits inside (...) evaluates to one number, and the outer expression treats that number like any other operand.

To support this we split eval into two helpers:

- parseExpression(source, pos) -- the loop you already have, but reads operands by calling numberOrParens and stops on ) or 0.
- numberOrParens(source, pos) -- reads one operand: a number, or (, recurse into parseExpression, then expect ).

eval becomes a thin wrapper that sets up state and calls parseExpression.

    testCase("(2+3)", 5);
    testCase("2*(3+4)", 14);   // without parens: 2*3+4 = 10
    testCase("8-(3+4)", 1);    // without parens: 8-3+4 = 9
    testCase("((1+2)*(3+4))", 21);

fn numberOrParens(source: [*]const u8, pos: *usize) i64 {
    if (source[pos.*] == '(') {
        pos.* += 1;
        skipSpaces(source, pos);
        const val: i64 = parseExpression(source, pos);
        if (source[pos.*] == ')') {
            pos.* += 1;
            skipSpaces(source, pos);
        }
        return val;
    }
    return readNumber(source, pos);
}

fn parseExpression(source: [*]const u8, pos: *usize) i64 {
    var val: i64 = numberOrParens(source, pos);
    while (source[pos.*] != 0 and source[pos.*] != ')') {
        const op: u8 = source[pos.*];
        pos.* += 1;
        skipSpaces(source, pos);
        const right: i64 = numberOrParens(source, pos);
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

fn eval(input: []const u8) i64 {
    const source: [*]const u8 = input.ptr;
    var pos: usize = 0;
    skipSpaces(source, &pos);
    return parseExpression(source, &pos);
}

Notice something: inside the helpers we write parseExpression(source, pos) -- no &, because pos is already a *usize pointer here. We only use &pos inside eval, where pos is a local usize variable and we need to take its address to hand it out.

Also notice: 2+3*4 still gives 20 here -- parens don't fix precedence on their own. But you can write 2+(3*4) to get 14. The escape hatch.

precedence-bug

Time to face the next problem. 2+3*4 should be 14 in math. Our parser gives 20. Write a failing test:

    testCase("2+3*4", 14);

Run it. Watch FAIL print. Don't fix anything yet -- just sit with it. Why does it go wrong?

Trace parseExpression on 2+3*4 in your head. It reads 2 as a numberOrParens. It sees +. It reads the next numberOrParens. What does numberOrParens see, and what does it return?

FAIL: got 20, want 14.

Trace:

parseExpression reads 2 as numberOrParens. Sees +. Reads next numberOrParens: just 3.
Adds: 2+3 = 5. Sees *. Reads next numberOrParens: 4.
Multiplies: 5*4 = 20.

The bug: after the +, the right operand was a single numberOrParens (just 3). But the * was waiting to grab that 3 first. We need the + to wait until the * has done its work on the right side.

terms-and-factors

The parser is at 2+3*4. It just consumed the 2 and the +. It's sitting on the 3. Why can't it just add 2 + 3 right now?

Because we're in a sum, and the right operand of a sum isn't a single value -- it's a term. A term is one or more factors multiplied (or divided) together. A factor is a number, or something inside parentheses (the numberOrParens we already wrote). Even a lone 3 is a term -- a term made of one factor.

So when parseExpression reads its right operand, it should ask for a term, not just a factor. A term function reads factors and grinds through any *// chain it finds, then hands back one number. The + only happens after that.

The structure we're heading toward:

- parseExpression -- loops on +/-, asks for terms between them.
- parseTerm -- loops on *//, asks for factors between them.
- parseFactor -- a number, or a parenthesized expression. (This is what we've been calling numberOrParens -- now rename it to parseFactor. The grammar vocabulary clicks.)

For this step:

1. Rename numberOrParens to parseFactor. Update the call sites.
2. Introduce parseTerm as a passthrough -- it just returns parseFactor, no loop yet.
3. Have parseExpression call parseTerm instead of parseFactor in its operand reads.

fn parseFactor(source: [*]const u8, pos: *usize) i64 {
    if (source[pos.*] == '(') {
        pos.* += 1;
        skipSpaces(source, pos);
        const val: i64 = parseExpression(source, pos);
        if (source[pos.*] == ')') {
            pos.* += 1;
            skipSpaces(source, pos);
        }
        return val;
    }
    return readNumber(source, pos);
}

fn parseTerm(source: [*]const u8, pos: *usize) i64 {
    return parseFactor(source, pos);
}

fn parseExpression(source: [*]const u8, pos: *usize) i64 {
    var val: i64 = parseTerm(source, pos);
    while (source[pos.*] != 0 and source[pos.*] != ')') {
        const op: u8 = source[pos.*];
        pos.* += 1;
        skipSpaces(source, pos);
        const right: i64 = parseTerm(source, pos);
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

The bug isn't fixed yet. Stare at this for one minute and guess what you will put in parseTerm on the next page.

term-multiplies

Now make parseTerm do its job.

Move the * and / handling out of parseExpression into parseTerm. parseExpression loops only on +/-. parseTerm loops only on *// and asks parseFactor for each operand.

When parseExpression calls parseTerm for its right operand on 2+3*4, parseTerm sees 3, then *4, and returns 12. Then parseExpression adds: 2+12 = 14.

    testCase("2+3*4", 14);
    testCase("3*4+2", 14);
    testCase("2*3+4*5", 26);
    testCase("(2+3)*4", 20);
    testCase("100 - 55 * 2", -10);

fn parseTerm(source: [*]const u8, pos: *usize) i64 {
    var val: i64 = parseFactor(source, pos);
    while (source[pos.*] == '*' or source[pos.*] == '/') {
        const op: u8 = source[pos.*];
        pos.* += 1;
        skipSpaces(source, pos);
        const right: i64 = parseFactor(source, pos);
        val = switch (op) {
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

fn parseExpression(source: [*]const u8, pos: *usize) i64 {
    var val: i64 = parseTerm(source, pos);
    while (source[pos.*] == '+' or source[pos.*] == '-') {
        const op: u8 = source[pos.*];
        pos.* += 1;
        skipSpaces(source, pos);
        const right: i64 = parseTerm(source, pos);
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            else => val,
        };
    }
    return val;
}

Three functions, three precedence layers: parseExpression -> parseTerm -> parseFactor. Each layer loops on its own operators and asks the next-tighter layer for operands. Operands come back fully resolved. Recursion does the rest.

This is recursive descent. To add another precedence level (say, ^ for power between term and factor), you insert one more function in the chain. Comparison operators come next; they slot in as a layer above expression.

comparison-operators

Add comparisons: >, <, ==, !=. They return 1 for true, 0 for false, and bind more loosely than addition.

We're adding a new precedence layer above the current expression. So:

1. Rename the current parseExpression (which loops on +/-) to parseAddSub. Update the recursive call inside parseFactor.
2. Add a new parseExpression on top that handles comparisons -- it calls parseAddSub for the left operand, then checks for a comparison operator, then calls parseAddSub again for the right.

For == and !=, check two characters: source[pos.*] and source[pos.* + 1].

    testCase("5 > 3", 1);
    testCase("2 > 7", 0);
    testCase("3 == 3", 1);
    testCase("3 != 3", 0);

Rename the old expression function to parseAddSub:

fn parseAddSub(source: [*]const u8, pos: *usize) i64 {
    var val: i64 = parseTerm(source, pos);
    while (source[pos.*] == '+' or source[pos.*] == '-') {
        const op: u8 = source[pos.*];
        pos.* += 1;
        skipSpaces(source, pos);
        const right: i64 = parseTerm(source, pos);
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            else => val,
        };
    }
    return val;
}

Then the new parseExpression on top:

fn parseExpression(source: [*]const u8, pos: *usize) i64 {
    const left: i64 = parseAddSub(source, pos);
    const c: u8 = source[pos.*];
    if (c == '>') {
        pos.* += 1;
        skipSpaces(source, pos);
        return if (left > parseAddSub(source, pos)) @as(i64, 1) else 0;
    }
    if (c == '<') {
        pos.* += 1;
        skipSpaces(source, pos);
        return if (left < parseAddSub(source, pos)) @as(i64, 1) else 0;
    }
    if (c == '=' and source[pos.* + 1] == '=') {
        pos.* += 2;
        skipSpaces(source, pos);
        return if (left == parseAddSub(source, pos)) @as(i64, 1) else 0;
    }
    if (c == '!' and source[pos.* + 1] == '=') {
        pos.* += 2;
        skipSpaces(source, pos);
        return if (left != parseAddSub(source, pos)) @as(i64, 1) else 0;
    }
    return left;
}

Don't forget to update the recursive call in parseFactor -- when we see (, we recurse into parseExpression (the new top), not parseAddSub.

trace-precedence

Walk through the execution of parseExpression -> parseAddSub -> parseTerm -> parseFactor for the input "2 + 3 * 4". Write down each function call, which characters it consumes, and what it returns.

1. parseExpression -> parseAddSub
2. parseAddSub -> parseTerm -> parseFactor -> readNumber reads 2, returns 2. parseTerm sees + (not *), returns 2.
3. parseAddSub sees +, advances. Calls parseTerm.
4. parseTerm -> parseFactor -> readNumber reads 3, returns 3.
5. parseTerm sees *, advances. Calls parseFactor -> readNumber reads 4, returns 4.
6. parseTerm computes 3 * 4 = 12, returns 12.
7. parseAddSub computes 2 + 12 = 14, returns 14.
8. parseExpression has left = 14, sees end-of-input (no comparison operator), returns 14.

The key insight: parseTerm "grabs" the 3 * 4 before parseAddSub can see it. That's how precedence works -- lower-precedence functions call higher-precedence ones, which consume their operands first.

is-letter-read-name

Variable names. A name starts with a letter (or _), and can contain letters, digits, and underscores after.

Write two classifiers and an identifier reader.

fn isAlpha(c: u8) bool {
    return (c >= 'a' and c <= 'z') or (c >= 'A' and c <= 'Z') or c == '_';
}

fn isDigit(c: u8) bool {
    return c >= '0' and c <= '9';
}

Why split isAlpha and isDigit? A name must start with a letter, but any byte after can be a digit too. x1 is legal, 1x is not. Two small predicates keep both rules crisp.

While you're here: you can now simplify readNumber to call isDigit instead of its inline range check.

Write readIdentifier: reads a name, returns a slice into the source.

fn readIdentifier(source: [*]const u8, pos: *usize) []const u8 {
    const start = pos.*;
    while (isAlpha(source[pos.*]) or isDigit(source[pos.*])) {
        pos.* += 1;
    }
    const name = source[start..pos.*];
    skipSpaces(source, pos);
    return name;
}

source[start..pos.*] is a slice -- a view into the string. No copying, no allocation. Slices are one of Zig's best features, and we'll use them constantly.

The cleaned-up readNumber:

fn readNumber(source: [*]const u8, pos: *usize) i64 {
    var value: i64 = 0;
    while (isDigit(source[pos.*])) {
        value = value * 10 + digitValue(source[pos.*]);
        pos.* += 1;
    }
    skipSpaces(source, pos);
    return value;
}

streq-from-scratch

We need to compare strings. We're not going to use std.mem.eql -- we'll write our own. It's five lines, and we'll need something like it in our self-hosted compiler anyway.

fn stringsEqual(a: []const u8, b: []const u8) bool {
    if (a.len != b.len) {
        return false;
    }
    for (a, b) |ca, cb| {
        if (ca != cb) {
            return false;
        }
    }
    return true;
}

Test it in main: print("{}\n", .{stringsEqual("cat", "cat")}); -> true. print("{}\n", .{stringsEqual("cat", "dog")}); -> false.

We'll use stringsEqual everywhere we need to check for keywords like "var", "if", "while".

(The implementation is already in the problem text.)

variable-storage

Add variable storage: two parallel arrays and a counter. We want to avoid globals (they get nasty when functions arrive), so we keep the storage local to main and pass it around by pointer.

Declare in main:

var varNames: [256][]const u8 = undefined;
var varValues: [256]i64 = undefined;
var varCount: usize = 0;

[256] is bigger than any sane program will need. undefined means "whatever garbage is on the stack" -- we only ever read slots below varCount, so the garbage is fine.

Write two functions:

- lookupVariable -- finds a name, returns its value, or 0 if missing. Doesn't mutate anything.
- assignVariable -- finds a name and updates its slot, or adds a new one.

Both take the arrays by pointer. lookupVariable takes count by value (it only reads); assignVariable takes count: *usize because it bumps the count when adding.

fn lookupVariable(
    names: *[256][]const u8,
    values: *[256]i64,
    count: usize,
    name: []const u8,
) i64 {
    var i: usize = count;
    while (i > 0) {
        i -= 1;
        if (stringsEqual(names[i], name)) {
            return values[i];
        }
    }
    return 0;
}

fn assignVariable(
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
    name: []const u8,
    val: i64,
) void {
    var i: usize = 0;
    while (i < count.*) {
        if (stringsEqual(names[i], name)) {
            values[i] = val;
            return;
        }
        i += 1;
    }
    names[count.*] = name;
    values[count.*] = val;
    count.* += 1;
}

*[256][]const u8 is a pointer to an array of 256 string slices. Zig auto-dereferences for indexing -- names[i] inside the function does what you'd expect, grabbing the i-th slot.

Why search backwards in lookupVariable? So that shadowing works when we add function locals later (the newest binding for a given name wins).

Test directly from main:

pub fn main() void {
    var varNames: [256][]const u8 = undefined;
    var varValues: [256]i64 = undefined;
    var varCount: usize = 0;

    assignVariable(&varNames, &varValues, &varCount, "x", 42);
    assignVariable(&varNames, &varValues, &varCount, "y", 99);

    print("{d}\n", .{lookupVariable(&varNames, &varValues, varCount, "x")}); // 42
    print("{d}\n", .{lookupVariable(&varNames, &varValues, varCount, "y")}); // 99
    print("{d}\n", .{lookupVariable(&varNames, &varValues, varCount, "z")}); // 0
}

factor-handles-names

Now hook variables into the parser. Three sweeping changes:

1. parseFactor learns to read names. When the current byte is a letter, read an identifier and look up its value.
2. All parser functions gain three parameters (names, values, count pointers). They need to pass them down to recursive calls and to lookupVariable. This is tedious but simple plumbing -- we'll clean it up with a struct later, once it gets unbearable.
3. New functions executeStatement and runProgram. A statement is a var/const declaration, an assignment, or a bare expression. runProgram loops, executing statements until end-of-input or a }.

eval becomes a thin entry point: set up pos, skip leading spaces, call runProgram. testCase and main grow to thread the variable pointers through.

The var handler parses var name: i64 = expr; -- it reads the name, consumes : i64 (parsing the type but not using it yet), then evaluates the initializer. We throw the type away for now. When we compile to WAT later, we'll need it to emit (local $x i64); the groundwork is being laid quietly.

    testCase("var x: i64 = 42; x", 42, &varNames, &varValues, &varCount);
    testCase("var x: i64 = 5; x + 1", 6, &varNames, &varValues, &varCount);
    testCase("var x: i64 = 5; x = x + 1; x", 6, &varNames, &varValues, &varCount);
    testCase("var a: i64 = 10; var b: i64 = 20; a + b", 30, &varNames, &varValues, &varCount);

parseFactor with the name branch:

fn parseFactor(
    source: [*]const u8,
    pos: *usize,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) i64 {
    if (source[pos.*] == '(') {
        pos.* += 1;
        skipSpaces(source, pos);
        const val: i64 = parseExpression(source, pos, names, values, count);
        if (source[pos.*] == ')') {
            pos.* += 1;
            skipSpaces(source, pos);
        }
        return val;
    }
    if (isAlpha(source[pos.*])) {
        const name = readIdentifier(source, pos);
        return lookupVariable(names, values, count.*, name);
    }
    return readNumber(source, pos);
}

parseTerm, parseAddSub, parseExpression gain the same three parameters and pass them through. Nothing else about them changes.

executeStatement -- dispatches on what kind of statement we're looking at:

fn executeStatement(
    source: [*]const u8,
    pos: *usize,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) i64 {
    if (isAlpha(source[pos.*])) {
        const savedPos = pos.*;
        const firstWord = readIdentifier(source, pos);

        // var/const declaration: var name: type = expr;
        if (stringsEqual(firstWord, "var") or stringsEqual(firstWord, "const")) {
            const varName = readIdentifier(source, pos);
            if (source[pos.*] == ':') {
                pos.* += 1;
                skipSpaces(source, pos);
                _ = readIdentifier(source, pos);   // parse the type, ignore it
            }
            if (source[pos.*] == '=') {
                pos.* += 1;
                skipSpaces(source, pos);
            }
            const initValue = parseExpression(source, pos, names, values, count);
            if (source[pos.*] == ';') {
                pos.* += 1;
                skipSpaces(source, pos);
            }
            assignVariable(names, values, count, varName, initValue);
            return initValue;
        }

        // assignment: name = expr; (not == which is a comparison)
        if (source[pos.*] == '=' and source[pos.* + 1] != '=') {
            pos.* += 1;
            skipSpaces(source, pos);
            const newValue = parseExpression(source, pos, names, values, count);
            if (source[pos.*] == ';') {
                pos.* += 1;
                skipSpaces(source, pos);
            }
            assignVariable(names, values, count, firstWord, newValue);
            return newValue;
        }

        // not a keyword or assignment: rewind and treat as expression
        pos.* = savedPos;
    }

    const result = parseExpression(source, pos, names, values, count);
    if (source[pos.*] == ';') {
        pos.* += 1;
        skipSpaces(source, pos);
    }
    return result;
}

runProgram -- loops until end-of-input or }:

fn runProgram(
    source: [*]const u8,
    pos: *usize,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) i64 {
    var lastValue: i64 = 0;
    while (source[pos.*] != 0 and source[pos.*] != '}') {
        lastValue = executeStatement(source, pos, names, values, count);
    }
    return lastValue;
}

The updated eval, testCase, and main:

fn eval(
    input: []const u8,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) i64 {
    const source: [*]const u8 = input.ptr;
    var pos: usize = 0;
    skipSpaces(source, &pos);
    return runProgram(source, &pos, names, values, count);
}

fn testCase(
    input: []const u8,
    expected: i64,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) void {
    const got: i64 = eval(input, names, values, count);
    if (got == expected) {
        print("{s} = {d} ok\n", .{ input, got });
    } else {
        print("{s} = {d} FAIL want {d}\n", .{ input, got, expected });
    }
}

pub fn main() void {
    var varNames: [256][]const u8 = undefined;
    var varValues: [256]i64 = undefined;
    var varCount: usize = 0;

    testCase("var x: i64 = 42; x", 42, &varNames, &varValues, &varCount);
    testCase("var x: i64 = 5; x + 1", 6, &varNames, &varValues, &varCount);
    testCase("var x: i64 = 5; x = x + 1; x", 6, &varNames, &varValues, &varCount);
    testCase("var a: i64 = 10; var b: i64 = 20; a + b", 30, &varNames, &varValues, &varCount);
}

The &varNames, &varValues, &varCount tail on every call is painful. We'll fix it with a struct soon.

bug-missing-type

What's wrong?

var x = 5; x

Missing type annotation. Our language requires var x: i64 = 5;. executeStatement expects : right after the variable name -- that's how it knows this is a declaration, and it's where we'll read the type when we start emitting WAT. Without the :, the = 5 won't parse as a declaration.

Zig itself allows var x = 5; (type inference). We don't, and we won't. Every declaration stating its type is a minor pain to type and a big win for readability -- when you're scanning a function you can tell what every variable is without chasing the right-hand side.

if-else

Add if/else. When we see the word if, parse the condition in (), then execute or skip the { } block.

Two helpers:

- skipBlock -- jump past { ... } without executing, counting brace depth. Only needs source and pos.
- executeBlock -- execute statements inside { ... }. Needs the full parameter parade since the statements inside might touch variables.

Then add an if branch to executeStatement. From now on we'll stop spelling out the five-parameter signature in prose -- assume every parser/executor function takes source: [*]const u8, pos: *usize, names: *[256][]const u8, values: *[256]i64, count: *usize unless said otherwise.

    testCase("if (5 > 3) { 10 } else { 20 }", 10, &varNames, &varValues, &varCount);
    testCase("if (2 > 7) { 10 } else { 20 }", 20, &varNames, &varValues, &varCount);
    testCase("var x: i64 = 10; if (x > 5) { x = 99; } x", 99, &varNames, &varValues, &varCount);

fn skipBlock(source: [*]const u8, pos: *usize) void {
    if (source[pos.*] == '{') {
        pos.* += 1;
        skipSpaces(source, pos);
    }
    var depth: i32 = 1;
    while (depth > 0 and source[pos.*] != 0) {
        if (source[pos.*] == '{') depth += 1;
        if (source[pos.*] == '}') depth -= 1;
        if (depth > 0) pos.* += 1;
    }
    if (source[pos.*] == '}') {
        pos.* += 1;
        skipSpaces(source, pos);
    }
}

fn executeBlock(
    source: [*]const u8,
    pos: *usize,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) i64 {
    if (source[pos.*] == '{') {
        pos.* += 1;
        skipSpaces(source, pos);
    }
    var last: i64 = 0;
    while (source[pos.*] != '}' and source[pos.*] != 0) {
        last = executeStatement(source, pos, names, values, count);
    }
    if (source[pos.*] == '}') {
        pos.* += 1;
        skipSpaces(source, pos);
    }
    return last;
}

Inside executeStatement, right after the var/const handler, add:

        if (stringsEqual(firstWord, "if")) {
            if (source[pos.*] == '(') {
                pos.* += 1;
                skipSpaces(source, pos);
            }
            const cond = parseExpression(source, pos, names, values, count);
            if (source[pos.*] == ')') {
                pos.* += 1;
                skipSpaces(source, pos);
            }
            if (cond != 0) {
                const val = executeBlock(source, pos, names, values, count);
                if (isAlpha(source[pos.*])) {
                    const savedPos2 = pos.*;
                    const nextWord = readIdentifier(source, pos);
                    if (stringsEqual(nextWord, "else")) {
                        skipBlock(source, pos);
                    } else {
                        pos.* = savedPos2;
                    }
                }
                return val;
            } else {
                skipBlock(source, pos);
                if (isAlpha(source[pos.*])) {
                    const savedPos2 = pos.*;
                    const nextWord = readIdentifier(source, pos);
                    if (stringsEqual(nextWord, "else")) {
                        return executeBlock(source, pos, names, values, count);
                    }
                    pos.* = savedPos2;
                }
                return 0;
            }
        }

The else is optional -- we peek at the next identifier; if it's "else", we use it, otherwise we rewind pos.* and forget we ever looked.

bug-missing-braces

Why doesn't this work?

var x: i64 = 10;
if (x > 5)
    x = 99;

No braces. Our language requires if (x > 5) { x = 99; }. executeStatement looks for { after the condition -- no braces, no party.

Deliberate: it keeps the grammar simple, makes debugging easier (you can set a breakpoint inside the {}), and makes the compiler simpler. When we emit WAT later, we just emit if/end blocks. No ambiguity about what belongs to which branch.

while-loop

Add while. Save the position before the condition. Loop: re-parse the condition from the saved position each time, execute or skip the body.

    testCase(
        \\var s: i64 = 0; var i: i64 = 1;
        \\while (i < 6) {
        \\    s = s + i;
        \\    i = i + 1;
        \\}
        \\s
    , 15, &varNames, &varValues, &varCount);

Add to executeStatement's keyword chain, next to if:

        if (stringsEqual(firstWord, "while")) {
            const loopPos = pos.*;
            var last: i64 = 0;
            while (true) {
                pos.* = loopPos;
                if (source[pos.*] == '(') {
                    pos.* += 1;
                    skipSpaces(source, pos);
                }
                const cond = parseExpression(source, pos, names, values, count);
                if (source[pos.*] == ')') {
                    pos.* += 1;
                    skipSpaces(source, pos);
                }
                if (cond == 0) {
                    skipBlock(source, pos);
                    break;
                }
                last = executeBlock(source, pos, names, values, count);
            }
            return last;
        }

Notice something weird: we're re-parsing the condition string from scratch every iteration. We jump pos.* back to loopPos, re-read the characters, re-evaluate the expression. It works, but it means we're paying the parsing cost every loop iteration. This is one reason we'll eventually want a tree -- you parse once, walk the tree as many times as you like.

nested-control-flow

Test nested control flow:

    testCase(
        \\var x: i64 = 15;
        \\if (x > 10) {
        \\    if (x > 20) { 1 } else { 2 }
        \\} else {
        \\    3
        \\}
    , 2, &varNames, &varValues, &varCount);

## Part 5: Functions

This is the biggest step. A function call has to: evaluate arguments, save the caller's variables, bind arguments to parameters, execute the body, capture the return value, and restore everything. It's a lot of state management -- enough that passing five pointers around will finally break. We'll introduce a Context struct to bundle them, then keep going.

The reward: fib(10) = 89.

context-struct

Look at the signatures we've been accumulating:

fn parseExpression(
    source: [*]const u8,
    pos: *usize,
    names: *[256][]const u8,
    values: *[256]i64,
    count: *usize,
) i64 { ... }

Five parameters. Every parser function has them. Every recursive call passes them all. Every testCase ends with &varNames, &varValues, &varCount. It's silly.

And it's about to get worse. Functions need a function table, a return flag, a return value, and a stack to save/restore the caller's variables. That would add another seven or eight parameters.

So now we make a Context struct. One pointer replaces all of it. This is the first and only struct we're going to use in the whole language -- we put it off until the argument lists genuinely hurt. Now they do.

const Context = struct {
    source: [*]const u8 = undefined,
    pos: usize = 0,

    varNames: [256][]const u8 = undefined,
    varValues: [256]i64 = undefined,
    varCount: usize = 0,

    // more fields coming in the next few problems:
    // function table, return flag, call-save stack.
};

Field access on a pointer auto-dereferences, so ctx.pos works even though ctx is *Context.

Rewrite every parser, executor, and variable function to take ctx: *Context as their only argument. The small text-only helpers (skipSpaces, readNumber, readIdentifier, isSpace, etc.) keep their existing (source, pos) signature -- they don't need the whole context. Call them as skipSpaces(ctx.source, &ctx.pos).

parseFactor:

fn parseFactor(ctx: *Context) i64 {
    if (ctx.source[ctx.pos] == '(') {
        ctx.pos += 1;
        skipSpaces(ctx.source, &ctx.pos);
        const val = parseExpression(ctx);
        if (ctx.source[ctx.pos] == ')') {
            ctx.pos += 1;
            skipSpaces(ctx.source, &ctx.pos);
        }
        return val;
    }
    if (isAlpha(ctx.source[ctx.pos])) {
        const name = readIdentifier(ctx.source, &ctx.pos);
        return lookupVariable(ctx, name);
    }
    return readNumber(ctx.source, &ctx.pos);
}

The rest of the parser chain follows the same shape -- parseTerm(ctx), parseAddSub(ctx), parseExpression(ctx).

lookupVariable and assignVariable now take ctx:

fn lookupVariable(ctx: *Context, name: []const u8) i64 {
    var i: usize = ctx.varCount;
    while (i > 0) {
        i -= 1;
        if (stringsEqual(ctx.varNames[i], name)) {
            return ctx.varValues[i];
        }
    }
    return 0;
}

fn assignVariable(ctx: *Context, name: []const u8, val: i64) void {
    var i: usize = 0;
    while (i < ctx.varCount) {
        if (stringsEqual(ctx.varNames[i], name)) {
            ctx.varValues[i] = val;
            return;
        }
        i += 1;
    }
    ctx.varNames[ctx.varCount] = name;
    ctx.varValues[ctx.varCount] = val;
    ctx.varCount += 1;
}

executeStatement, executeBlock, runProgram, skipBlock all take ctx. (skipBlock only uses ctx.source and ctx.pos.)

eval sets up the source and kicks off runProgram:

fn eval(input: []const u8, ctx: *Context) i64 {
    ctx.source = input.ptr;
    ctx.pos = 0;
    skipSpaces(ctx.source, &ctx.pos);
    return runProgram(ctx);
}

testCase and main:

fn testCase(input: []const u8, expected: i64, ctx: *Context) void {
    const got = eval(input, ctx);
    if (got == expected) {
        print("{s} = {d} ok\n", .{ input, got });
    } else {
        print("{s} = {d} FAIL want {d}\n", .{ input, got, expected });
    }
}

pub fn main() void {
    var ctx = Context{};
    testCase("var x: i64 = 42; x", 42, &ctx);
    testCase("var x: i64 = 5; x + 1", 6, &ctx);
    testCase("var x: i64 = 5; x = x + 1; x", 6, &ctx);
}

Notice var ctx = Context{}; -- the struct's defaults (undefined for the arrays, 0 for pos and varCount) take care of initialization. Every call site is now (ctx) or (input, ctx). One pointer. The struct will grow silently as we add function-related fields; call sites won't change.

function-storage

Add function storage to Context. When executeStatement sees fn, read the name, the parameter list (with types), the return type, and save the body position. Skip the body without executing it.

Add these fields to Context:

fnNames: [64][]const u8 = undefined,
fnParams: [64][8][]const u8 = undefined,
fnParamCounts: [64]usize = undefined,
fnStarts: [64]usize = undefined,
fnCount: usize = 0,

Sixty-four functions, up to eight parameters each. If you need more, raise the numbers.

Add to executeStatement, alongside the var/const/if/while handlers:

        if (stringsEqual(firstWord, "fn")) {
            const fname = readIdentifier(ctx.source, &ctx.pos);
            ctx.fnNames[ctx.fnCount] = fname;
            if (ctx.source[ctx.pos] == '(') {
                ctx.pos += 1;
                skipSpaces(ctx.source, &ctx.pos);
            }
            var pc: usize = 0;
            while (ctx.source[ctx.pos] != ')' and ctx.source[ctx.pos] != 0) {
                ctx.fnParams[ctx.fnCount][pc] = readIdentifier(ctx.source, &ctx.pos);
                if (ctx.source[ctx.pos] == ':') {
                    ctx.pos += 1;
                    skipSpaces(ctx.source, &ctx.pos);
                    _ = readIdentifier(ctx.source, &ctx.pos);   // skip param type
                }
                pc += 1;
                if (ctx.source[ctx.pos] == ',') {
                    ctx.pos += 1;
                    skipSpaces(ctx.source, &ctx.pos);
                }
            }
            ctx.fnParamCounts[ctx.fnCount] = pc;
            if (ctx.source[ctx.pos] == ')') {
                ctx.pos += 1;
                skipSpaces(ctx.source, &ctx.pos);
            }
            if (isAlpha(ctx.source[ctx.pos])) {
                _ = readIdentifier(ctx.source, &ctx.pos);   // skip return type
            }
            ctx.fnStarts[ctx.fnCount] = ctx.pos;
            skipBlock(ctx);
            ctx.fnCount += 1;
            return 0;
        }

skipBlock takes ctx now -- it only reads ctx.source and advances ctx.pos, but that's enough to need the context.

We parse fn fib(n: i64) i64 { ... } and store the name, parameter names, and body position. The types are parsed and discarded -- for now. Later: (func $fib (param $n i64) (result i64) ...).

return-flag

Add return. We need a flag to say "I've returned, unwind everything" and a slot for the value. Add to Context:

returnFlag: bool = false,
returnValue: i64 = 0,

Handle return in executeStatement:

        if (stringsEqual(firstWord, "return")) {
            ctx.returnValue = parseExpression(ctx);
            if (ctx.source[ctx.pos] == ';') {
                ctx.pos += 1;
                skipSpaces(ctx.source, &ctx.pos);
            }
            ctx.returnFlag = true;
            return ctx.returnValue;
        }

Update executeBlock to bail out when the flag is set:

fn executeBlock(ctx: *Context) i64 {
    if (ctx.source[ctx.pos] == '{') {
        ctx.pos += 1;
        skipSpaces(ctx.source, &ctx.pos);
    }
    var last: i64 = 0;
    while (ctx.source[ctx.pos] != '}' and ctx.source[ctx.pos] != 0) {
        last = executeStatement(ctx);
        if (ctx.returnFlag) break;
    }
    if (ctx.source[ctx.pos] == '}') {
        ctx.pos += 1;
        skipSpaces(ctx.source, &ctx.pos);
    }
    return if (ctx.returnFlag) ctx.returnValue else last;
}

And the while handler: check ctx.returnFlag after each body execution, break out of the loop if it's set.

(The implementation is in the problem text.)

The flag is a cheap propagation mechanism. Any enclosing executor (block, while) checks it and bails. ctx travels up the call chain, so the flag is visible wherever it matters. callFunction (next problem) will reset the flag when it catches the return.

function-calls

Add function calls to parseFactor. When we see name(, it's a call. Evaluate the arguments, then hand off to callFunction which saves the caller's variables, binds parameters, runs the body, and restores everything.

The save/restore needs a stack. Add to Context:

saveNames: [32][256][]const u8 = undefined,
saveValues: [32][256]i64 = undefined,
saveCounts: [32]usize = undefined,
saveDepth: usize = 0,

Thirty-two deep. Fibonacci won't reach that without a lot of recursion -- but easy to raise.

    testCase("fn add(a: i64, b: i64) i64 { return a + b; } add(3, 5)", 8, &ctx);
    testCase("fn double(x: i64) i64 { return x * 2; } double(21)", 42, &ctx);

Update the name branch of parseFactor to look for ( after the identifier:

    if (isAlpha(ctx.source[ctx.pos])) {
        const name = readIdentifier(ctx.source, &ctx.pos);
        if (ctx.source[ctx.pos] == '(') {
            ctx.pos += 1;
            skipSpaces(ctx.source, &ctx.pos);
            var args: [8]i64 = undefined;
            var argCount: usize = 0;
            while (ctx.source[ctx.pos] != ')' and ctx.source[ctx.pos] != 0) {
                args[argCount] = parseExpression(ctx);
                argCount += 1;
                if (ctx.source[ctx.pos] == ',') {
                    ctx.pos += 1;
                    skipSpaces(ctx.source, &ctx.pos);
                }
            }
            if (ctx.source[ctx.pos] == ')') {
                ctx.pos += 1;
                skipSpaces(ctx.source, &ctx.pos);
            }
            return callFunction(ctx, name, &args, argCount);
        }
        return lookupVariable(ctx, name);
    }

callFunction:

fn callFunction(ctx: *Context, name: []const u8, args: *const [8]i64, argCount: usize) i64 {
    _ = argCount;
    // Find the function
    var fi: usize = 0;
    while (fi < ctx.fnCount) {
        if (stringsEqual(ctx.fnNames[fi], name)) break;
        fi += 1;
    }
    if (fi >= ctx.fnCount) return 0;

    // Save caller state
    const d = ctx.saveDepth;
    ctx.saveCounts[d] = ctx.varCount;
    const savedPos = ctx.pos;
    var i: usize = 0;
    while (i < ctx.varCount) {
        ctx.saveNames[d][i] = ctx.varNames[i];
        ctx.saveValues[d][i] = ctx.varValues[i];
        i += 1;
    }
    ctx.saveDepth += 1;

    // Bind parameters
    ctx.varCount = 0;
    var p: usize = 0;
    while (p < ctx.fnParamCounts[fi]) {
        assignVariable(ctx, ctx.fnParams[fi][p], args[p]);
        p += 1;
    }

    // Execute body
    ctx.pos = ctx.fnStarts[fi];
    ctx.returnFlag = false;
    const result = executeBlock(ctx);

    // Restore caller state
    ctx.saveDepth -= 1;
    ctx.pos = savedPos;
    ctx.varCount = ctx.saveCounts[d];
    var j: usize = 0;
    while (j < ctx.varCount) {
        ctx.varNames[j] = ctx.saveNames[d][j];
        ctx.varValues[j] = ctx.saveValues[d][j];
        j += 1;
    }
    ctx.returnFlag = false;
    return result;
}

This is the biggest function in the interpreter. Walk through it with a debugger on the first call to double(21). The caller's variables (none yet, first test) get snapshotted into ctx.saveNames[0], ctx.varCount resets to 0, the parameter x is bound to 21, the body runs, return x * 2; triggers ctx.returnFlag, executeBlock hands back 42, and then we restore the caller's state from the snapshot. Phew.

fibonacci-test

The big test:

    testCase(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0;
        \\    var b: i64 = 1;
        \\    var i: i64 = 0;
        \\    while (i < n) {
        \\        var t: i64 = a + b;
        \\        a = b;
        \\        b = t;
        \\        i = i + 1;
        \\    }
        \\    return b;
        \\}
        \\fib(10)
    , 89, &ctx);

If this prints ok 89 -- congratulations. You have a programming language. It handles arithmetic with precedence, typed variables, mandatory-brace if/else, while loops, and functions with typed parameters and return values. All from a string. All in one Zig file.

Stop and think about that. We started 27 problems ago with eval("7") returning 7. Now we have Fibonacci. That's a real distance.

(The test is the problem. If Fibonacci passes, you're done with the interpreter.)

bug-missing-return

What does this return?

fn double(x: i64) i64 {
    x * 2
}
double(21)

0. Not 42. Without return, the function body evaluates x * 2 = 42 but ctx.returnFlag is never set, so callFunction gets the block's "last value" -- but only if the block didn't bail. Actually look closer: executeBlock ends by returning if (ctx.returnFlag) ctx.returnValue else last;, so it would return 42. The 0 comes from somewhere earlier: callFunction resets ctx.returnFlag = false after the body, then returns result. So actually, you'd get 42 in this case.

Wait -- so what's the footgun? It's the OTHER case: an early bail.

fn thing(x: i64) i64 {
    if (x > 0) {
        return x;
    }
    x * 2
}

When x <= 0, the if does nothing, the block's last value is x * 2, so thing(-5) returns -10. When x > 0, the return sets the flag and short-circuits -- we get x. Two exit paths. A function with no explicit return is a function whose value is the last expression in the block -- which works in our tiny language, but it's a foot-gun in real ones. Every language has to pick a rule. Rust and Ruby return the last expression. C and Zig require return. We accept either.

full-test-reset

Run all the interpreter tests together. Each test should start fresh -- no variables or functions leaking in from a previous test. Easiest way: testCase makes a new Context every call.

fn testCase(input: []const u8, expected: i64) void {
    var ctx = Context{};
    const got = eval(input, &ctx);
    if (got == expected) {
        print("ok {d}\n", .{got});
    } else {
        print("FAIL got {d} want {d}\n", .{ got, expected });
    }
}

Main gets much quieter:

pub fn main() void {
    testCase("3+5", 8);
    testCase("var x: i64 = 42; x", 42);
    testCase(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0;
        \\    var b: i64 = 1;
        \\    var i: i64 = 0;
        \\    while (i < n) {
        \\        var t: i64 = a + b;
        \\        a = b;
        \\        b = t;
        \\        i = i + 1;
        \\    }
        \\    return b;
        \\}
        \\fib(10)
    , 89);
}

Every test is hermetic. The Context struct (varNames, fnNames, call-save stack, everything) is zeroed fresh. Yes, the struct is about 200KB on the stack per call. That's fine -- Zig's default stack has plenty of room, and the allocation is just a stack-pointer bump.

Run your full test list. All oks. The interpreter is complete.

## Part 6: The Compiler

Here's what we have: a parser that reads source text and computes answers. parseFactor reads a number, parseTerm handles * and /, parseAddSub handles + and -, parseExpression layers in comparisons. They call each other recursively, and values flow back up the call chain. It works beautifully.

Now here's the trick. What if, instead of computing values, those functions emitted instructions? What if parseFactor didn't return 42 but printed i64.const 42? What if parseTerm didn't multiply two values but printed i64.mul?

The parser structure wouldn't change at all. The recursion, the operator loops, the precedence -- all identical. Only the actions change. Crenshaw called this "syntax-directed translation," and it's one of the most satisfying ideas in all of computer science. You've already written the hard part -- the parser. The compiler is the parser wearing different shoes.

Let's build it.

output-buffer

Write the output buffer. We need somewhere to put the WAT instructions. A big byte array and a position, just like source and pos but for output.

var out: [1 << 20]u8 = [_]u8{0} ** (1 << 20);
var out_len: usize = 0;

Write emit_byte(b), emit_str(s), and emit_num(n) (which prints a number as decimal digits into the buffer):

fn emit_byte(b: u8) void {
    if (out_len < out.len) {
        out[out_len] = b;
        out_len += 1;
    }
}

fn emit_str(s: []const u8) void {
    for (s) |c| {
        emit_byte(c);
    }
}

fn emit_num(n: i64) void {
    if (n < 0) {
        emit_byte('-');
        emit_num(-n);
        return;
    }
    if (n >= 10) {
        emit_num(@divTrunc(n, 10));
    }
    emit_byte(@intCast(@as(u8, @intCast(@rem(n, 10))) + '0'));
}

Test it:

    out_len = 0;
    emit_str("hello ");
    emit_num(42);
    emit_byte('\n');
    print("{s}", .{out[0..out_len]});   // hello 42

This is the compiler's pen and paper. Everything it produces goes through these three functions.

Self-hosting note: when we write the compiler in its own language, these exact functions will exist there too. emit_byte, emit_s, emit_num. Simple enough to express in the language they compile.

emit-const-emit-op

Write two helpers:

fn emit_const(v: i64) void {
    emit_str("  i64.const ");
    emit_num(v);
    emit_byte('\n');
}

fn emit_op(o: []const u8) void {
    emit_str("  ");
    emit_str(o);
    emit_byte('\n');
}

c-number

Write c_number() -- reads a number from source, emits i64.const N:

fn c_number() void {
    const val: i64 = number();
    emit_const(val);
}

One line of real work. We reuse number() for parsing and do something different with the result.

c-factor-term-expression

Write c_factor(), c_term(), c_add_sub(), c_expression(). They mirror the interpreter exactly -- same structure, same operator checks, same recursion -- but emit instead of compute.

Here's the key comparison. The interpreter's term():

fn term() i64 {
    var val: i64 = factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = factor();
        if (op == '*') { val = val * right; } else { val = @divTrunc(val, right); }
    }
    return val;
}

The compiler's c_term():

fn c_term() void {
    c_factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        c_factor();
        if (op == '*') { emit_op("i64.mul"); } else { emit_op("i64.div_s"); }
    }
}

Same structure. Where the interpreter said val = val * right, the compiler says emit_op("i64.mul"). This works because WAT is a stack machine: c_factor() emits instructions that leave a value on the stack, and i64.mul pops two values and pushes the product.

fn c_factor() void {
    if (cur() == '(') {
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        return;
    }
    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (cur() == '(') {
            c_call(name);
            return;
        }
        if (c_is_global(name)) {
            emit_str("  global.get $");
        } else {
            emit_str("  local.get $");
        }
        emit_str(name);
        emit_byte('\n');
        return;
    }
    c_number();
}

fn c_term() void {
    c_factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        c_factor();
        if (op == '*') { emit_op("i64.mul"); } else { emit_op("i64.div_s"); }
    }
}

fn c_add_sub() void {
    c_term();
    while (cur() == '+' or cur() == '-') {
        const op: u8 = cur();
        pos += 1; skip();
        c_term();
        if (op == '+') { emit_op("i64.add"); } else { emit_op("i64.sub"); }
    }
}

fn c_expression() void {
    c_add_sub();
    if (cur() == '>') {
        pos += 1; skip(); c_add_sub(); emit_op("i64.gt_s");
    } else if (cur() == '<') {
        pos += 1; skip(); c_add_sub(); emit_op("i64.lt_s");
    } else if (cur() == '=' and pos + 1 < source.len and source[pos + 1] == '=') {
        pos += 2; skip(); c_add_sub(); emit_op("i64.eq");
    } else if (cur() == '!' and pos + 1 < source.len and source[pos + 1] == '=') {
        pos += 2; skip(); c_add_sub(); emit_op("i64.ne");
    }
}

Notice: c_factor already checks c_is_global(name) to decide between global.get and local.get. We'll define c_is_global shortly when we set up the global tracking.

c-call

Write c_call(name) -- compile function arguments and emit call $name:

fn c_call(name: []const u8) void {
    pos += 1; skip(); // skip '('
    while (cur() != ')' and cur() != 0) {
        c_expression();
        if (cur() == ',') { pos += 1; skip(); }
    }
    if (cur() == ')') { pos += 1; skip(); }
    emit_str("  call $");
    emit_str(name);
    emit_byte('\n');
}

c-stmt-c-block

Write c_stmt() and c_block(). This is where stack hygiene matters.

WAT has strict stack discipline -- every instruction sequence must leave the stack in a predictable state. So c_stmt returns true when it leaves a value on the stack (bare expressions) and false when it doesn't (var declarations, assignments, control flow). c_block uses this to drop intermediate values and ensure exactly one value remains at the end.

fn c_stmt() bool {
    if (is_letter(cur())) {
        const save: usize = pos;
        const word: []const u8 = read_name();

        if (streq(word, "var") or streq(word, "const")) {
            const vname: []const u8 = read_name();
            if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
            if (cur() == '=') { pos += 1; skip(); }
            c_expression();
            if (c_is_global(vname)) {
                emit_str("  global.set $");
            } else {
                emit_str("  local.set $");
            }
            emit_str(vname);
            emit_byte('\n');
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

        if (streq(word, "return")) {
            c_expression();
            emit_op("return");
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

        if (streq(word, "if")) {
            if (cur() == '(') { pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { pos += 1; skip(); }
            emit_str("  if (result i64)\n");
            c_block();
            if (is_letter(cur())) {
                const s2: usize = pos;
                const w2: []const u8 = read_name();
                if (streq(w2, "else")) {
                    emit_op("else");
                    c_block();
                } else {
                    pos = s2;
                    emit_str("  else\n");
                    emit_const(0);
                }
            } else {
                emit_str("  else\n");
                emit_const(0);
            }
            emit_op("end");
            return true;  // if (result i64) leaves one value
        }

        if (streq(word, "while")) {
            emit_op("block");
            emit_op("loop");
            if (cur() == '(') { pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { pos += 1; skip(); }
            emit_op("i64.eqz");
            emit_op("br_if 1");
            c_block();
            emit_op("drop");
            emit_op("br 0");
            emit_op("end");
            emit_op("end");
            return false;
        }

        if (streq(word, "fn")) {
            c_fn_def();
            return false;
        }

        // assignment: name = expr; (but not == comparison)
        if (cur() == '=' and (pos + 1 >= source.len or source[pos + 1] != '=')) {
            pos += 1; skip();
            c_expression();
            if (c_is_global(word)) {
                emit_str("  global.set $");
            } else {
                emit_str("  local.set $");
            }
            emit_str(word);
            emit_byte('\n');
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

        pos = save; // backtrack -- expression starting with name
    }

    c_expression();
    if (cur() == ';') { pos += 1; skip(); }
    return true;  // bare expression leaves a value on the stack
}

fn c_block() void {
    if (cur() == '{') { pos += 1; skip(); }
    var last_had_value: bool = false;
    while (cur() != '}' and cur() != 0) {
        if (last_had_value) {
            emit_op("drop");
        }
        last_had_value = c_stmt();
    }
    if (!last_had_value) {
        emit_const(0);
    }
    if (cur() == '}') { pos += 1; skip(); }
}

The if handler uses if (result i64) -- both branches must produce exactly one value. c_block guarantees this. When there's no else, we synthesize one that pushes 0. The if returns true because it leaves a value on the stack.

The while handler drops the block value before br 0 (the loop-back jump). Without the drop, each iteration would pile another value on the stack.

The assignment handler checks source[pos + 1] != '=' to avoid confusing x = 5 with x == 5. Without this, x == 5 would be parsed as assignment, storing 0.

c-global-tracking

Global variable tracking for the compiler:

var c_global_names: [64][]const u8 = undefined;
var c_global_count: usize = 0;

fn c_is_global(name: []const u8) bool {
    for (0..c_global_count) |i| {
        if (streq(c_global_names[i], name)) {
            return true;
        }
    }
    return false;
}

c-local-scanning

Local variable scanning. WAT requires all (local ...) declarations at the top of a function. We scan the body before compiling it:

var local_names: [64][]const u8 = undefined;
var local_count: usize = 0;

fn scan_for_locals(start: usize, end_pos: usize) void {
    const saved: usize = pos;
    pos = start;
    local_count = 0;
    while (pos < end_pos) {
        if (is_letter(cur())) {
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vname: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                var dup: bool = false;
                for (0..local_count) |i| {
                    if (streq(local_names[i], vname)) { dup = true; break; }
                }
                if (!dup and !c_is_global(vname)) {
                    local_names[local_count] = vname;
                    local_count += 1;
                }
            }
        } else {
            pos += 1;
        }
    }
    pos = saved;
}

fn emit_locals() void {
    for (0..local_count) |i| {
        emit_str("  (local $");
        emit_str(local_names[i]);
        emit_str(" i64)\n");
    }
    emit_str("  (local $__tmp i64)\n");
}

Every function gets a $__tmp local -- the store instructions need it to swap operand order on the stack.

c-fn-def

Write c_fn_def() -- compile a function definition:

fn c_fn_def() void {
    const fname: []const u8 = read_name();
    if (cur() == '(') { pos += 1; skip(); }

    var params: [8][]const u8 = undefined;
    var ptypes: [8][]const u8 = undefined;
    var pc: usize = 0;
    while (cur() != ')' and cur() != 0) {
        params[pc] = read_name();
        if (cur() == ':') { pos += 1; skip(); ptypes[pc] = read_name(); } else { ptypes[pc] = "i64"; }
        pc += 1;
        if (cur() == ',') { pos += 1; skip(); }
    }
    if (cur() == ')') { pos += 1; skip(); }

    var rtype: []const u8 = "i64";
    if (is_letter(cur())) { rtype = read_name(); }

    const body_start: usize = pos;
    skip_block();
    const body_end: usize = pos;

    scan_for_locals(body_start, body_end);

    emit_str("(func $");
    emit_str(fname);
    for (0..pc) |i| {
        emit_str(" (param $");
        emit_str(params[i]);
        emit_str(" ");
        emit_str(ptypes[i]);
        emit_str(")");
    }
    emit_str(" (result ");
    emit_str(rtype);
    emit_str(")\n");
    emit_locals();

    pos = body_start;
    c_block();
    emit_op("drop"); // c_block always produces a value; function returns via 'return'
    emit_const(0);   // default return value if no 'return' hit

    emit_str(")\n");
}

The function body goes through c_block which produces a value. We drop it and push 0 as a default return. Functions should use explicit return expr; -- the return instruction bypasses the default.

c-builtins

Builtin handling in c_factor(). Add before the general function call:

    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (cur() == '(') {
            // Builtins
            if (streq(name, "load8")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_op("i32.wrap_i64");
                emit_op("i32.load8_u");
                emit_op("i64.extend_i32_u");
                return;
            }
            if (streq(name, "store8")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ',') { pos += 1; skip(); }
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_str("  local.set $__tmp\n");
                emit_str("  i32.wrap_i64\n");
                emit_str("  local.get $__tmp\n");
                emit_str("  i32.wrap_i64\n");
                emit_op("i32.store8");
                emit_const(0);
                return;
            }
            if (streq(name, "load64")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_op("i32.wrap_i64");
                emit_op("i64.load");
                return;
            }
            if (streq(name, "store64")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ',') { pos += 1; skip(); }
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_str("  local.set $__tmp\n");
                emit_str("  i32.wrap_i64\n");
                emit_str("  local.get $__tmp\n");
                emit_op("i64.store");
                emit_const(0);
                return;
            }
            // General function call
            c_call(name);
            return;
        }
        // Variable reference
        if (c_is_global(name)) {
            emit_str("  global.get $");
        } else {
            emit_str("  local.get $");
        }
        emit_str(name);
        emit_byte('\n');
        return;
    }

c-program

Write c_program() -- the top-level function. Three passes: scan globals, compile functions, compile main.

fn c_program(input: []const u8) void {
    out_len = 0;
    c_global_count = 0;

    // Pass 1: scan for top-level globals
    source = input; pos = 0; skip();
    var depth: i32 = 0;
    while (pos < source.len) {
        if (source[pos] == '{') { depth += 1; pos += 1; continue; }
        if (source[pos] == '}') { depth -= 1; pos += 1; continue; }
        if (depth == 0 and is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vn: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                c_global_names[c_global_count] = vn;
                c_global_count += 1;
            } else if (streq(w, "fn")) {
                while (cur() != '{' and cur() != 0) { pos += 1; }
                if (cur() == '{') { skip_block(); }
            } else {
                pos = s; pos += 1;
            }
        } else { pos += 1; }
    }

    // Emit globals
    for (0..c_global_count) |i| {
        emit_str("(global $");
        emit_str(c_global_names[i]);
        emit_str(" (mut i64) (i64.const 0))\n");
    }

    // Pass 2: compile function definitions
    source = input; pos = 0; skip();
    while (cur() != 0) {
        if (is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "fn")) { pos = s; _ = c_stmt(); continue; }
            pos = s;
        }
        if (cur() == '{') { skip_block(); } else { pos += 1; }
    }

    // Pass 3: compile top-level code into main
    emit_str("(func (export \"main\") (result i64)\n");

    // Scan for top-level locals
    source = input; pos = 0; skip();
    local_count = 0;
    depth = 0;
    while (pos < source.len) {
        if (source[pos] == '{') { depth += 1; pos += 1; continue; }
        if (source[pos] == '}') { depth -= 1; pos += 1; continue; }
        if (depth == 0 and is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vn: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                if (!c_is_global(vn)) {
                    var dup: bool = false;
                    for (0..local_count) |i| { if (streq(local_names[i], vn)) { dup = true; break; } }
                    if (!dup) { local_names[local_count] = vn; local_count += 1; }
                }
            } else if (streq(w, "fn")) {
                while (cur() != '{' and cur() != 0) { pos += 1; }
                if (cur() == '{') { skip_block(); }
            } else { pos = s; pos += 1; }
        } else { pos += 1; }
    }
    emit_locals();

    // Compile top-level statements (skip fn definitions)
    source = input; pos = 0; skip();
    var last_had_value: bool = false;
    while (cur() != 0) {
        if (is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "fn")) {
                while (cur() != '{' and cur() != 0) { pos += 1; }
                skip_block();
                continue;
            }
            pos = s;
        }
        if (last_had_value) { emit_op("drop"); }
        last_had_value = c_stmt();
    }
    if (!last_had_value) { emit_const(0); }

    emit_str(")\n");
}

test-compiler-output

Test the compiler. Compile a source string and inspect the WAT:

fn compile_expr(input: []const u8) void {
    c_program(input);
    print("{s}", .{out[0..out_len]});
}

    print("--- WAT output ---\n", .{});
    compile_expr("2 + 3 * 4");

Expected output (the global/main wrapper plus three instructions and a return value):

(func (export "main") (result i64)
  (local $__tmp i64)
  i64.const 2
  i64.const 3
  i64.const 4
  i64.mul
  i64.add
)

compile-fibonacci

Compile fib(10) and inspect the output:

    c_program(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0; var b: i64 = 1; var i: i64 = 0;
        \\    while (i < n) { var t: i64 = a + b; a = b; b = t; i = i + 1; }
        \\    return b;
        \\}
        \\fib(10)
    );
    print("--- Fib WAT ---\n{s}\n", .{out[0..out_len]});

bug-missing-eqz

This WAT compiles but gives the wrong answer for while (i < 5). Why?

  block
  loop
    local.get $i
    i64.const 5
    i64.lt_s
    br_if 1
    ;; body
    br 0
  end
  end

Missing i64.eqz. br_if branches when the value is nonzero. i64.lt_s returns 1 when true. So br_if 1 exits when the condition is true -- backwards. i64.eqz flips it: exit when the condition is false (zero).

trace-stack-machine

Walk through the WAT for 2 + 3 * 4 on a stack machine:

| Instruction | Stack |
|-------------|-------|
| i64.const 2 | [2] |
| i64.const 3 | [2, 3] |
| i64.const 4 | [2, 3, 4] |
| i64.mul | [2, 12] |
| i64.add | [14] |

The parser's recursive descent produces the instruction order: deeper recursion (higher precedence) emits inner instructions that execute first.

design-no-ast

Why didn't we need an AST?

WAT's stack structure mirrors the parser's recursive structure. c_term() calls c_factor() first, which emits its instructions first, which means those instructions execute first on the stack. The recursive descent parser naturally produces the right instruction order. An AST would let us optimize or emit multiple formats, but for a self-hosting compiler, fewer data structures means less code means easier self-hosting. c4 works the same way -- 525 lines, no AST, single-pass.

test-compiler-globals

Test the compiler with globals. Compile this and inspect the output:

    c_program("var x: i64 = 0; fn inc() i64 { x = x + 1; return x; } inc()");
    print("{s}", .{out[0..out_len]});

You should see (global $x (mut i64) (i64.const 0)) in the output, and global.get $x / global.set $x inside the function body -- not local.get / local.set.

verify-all-tests

Verify all previous interpreter tests still pass. The compiler changes should not have broken the interpreter -- they're separate code paths. Run all 29 interpreter tests plus the global variable tests from Part 9. All ok.

What we have now: a compiler that reads the same source text as the interpreter and emits WAT instructions. It handles arithmetic with precedence, typed variables, if/else, while loops, typed functions, global variables, and memory access. The parser structure is identical to the interpreter -- we just swapped computation for emission.

What we need next: a way to run the WAT without leaving Zig. We can't rely on wat2wasm and Node.js for our test suite -- we need to verify the compiler's output automatically, in the same process. That means building a VM.

## Part 7: The VM

We have two things: an interpreter that computes answers from source text, and a compiler that produces WAT from the same source text. What we don't have is proof that the compiler is correct. The interpreter says fib(10) = 89. The compiler produces 30 lines of WAT. Do those 30 lines, when executed, also give 89?

We need a machine that runs WAT. Not a full WebAssembly runtime -- just enough to execute the instructions our compiler emits. We know exactly which instructions those are because we wrote the compiler. No surprises.

The machine is simple: a stack, some local variables, a program counter, and a handful of rules. Let's build it.

vm-split-lines

Split WAT text into lines. We'll store them in a global array and work with them by index.

var vm_lines: [4096][]const u8 = undefined;
var vm_line_count: usize = 0;

fn vm_split(wat: []const u8) void {
    vm_line_count = 0;
    var start: usize = 0;
    for (wat, 0..) |c, i| {
        if (c == '\n') {
            if (i > start) {
                vm_lines[vm_line_count] = wat[start..i];
                vm_line_count += 1;
            }
            start = i + 1;
        }
    }
    if (start < wat.len) {
        vm_lines[vm_line_count] = wat[start..wat.len];
        vm_line_count += 1;
    }
}

Test it:

    vm_split("  i64.const 2\n  i64.const 3\n  i64.add\n");
    print("lines: {d}\n", .{vm_line_count});   // 3

vm-trim-helper

Write a line-trimming helper. Our WAT lines have leading spaces. We need to strip them to compare against instruction names.

fn vm_trim(line: []const u8) []const u8 {
    var s: usize = 0;
    while (s < line.len and (line[s] == ' ' or line[s] == '\t')) {
        s += 1;
    }
    return line[s..];
}

And a helper to check if a line starts with a given prefix:

fn starts_with(hay: []const u8, needle: []const u8) bool {
    if (hay.len < needle.len) {
        return false;
    }
    for (needle, 0..) |c, i| {
        if (hay[i] != c) {
            return false;
        }
    }
    return true;
}

We'll use starts_with constantly -- starts_with(line, "i64.const "), starts_with(line, "local.get $"), etc. Each instruction has a recognizable prefix, and the operand (a number or a name) follows it.

vm-parse-int-name

Write a helper to parse a number from a string at a given offset:

fn parse_int(s: []const u8, from: usize) i64 {
    var i: usize = from;
    var neg: bool = false;
    if (i < s.len and s[i] == '-') {
        neg = true;
        i += 1;
    }
    var val: i64 = 0;
    while (i < s.len and s[i] >= '0' and s[i] <= '9') {
        val = val * 10 + digit(s[i]);
        i += 1;
    }
    return if (neg) -val else val;
}

And one to extract a name after $:

fn parse_name(s: []const u8, from: usize) []const u8 {
    var i: usize = from;
    // find the $
    while (i < s.len and s[i] != '$') {
        i += 1;
    }
    if (i < s.len) {
        i += 1; // skip $
    }
    const start: usize = i;
    while (i < s.len and s[i] != ' ' and s[i] != ')' and s[i] != '\n') {
        i += 1;
    }
    return s[start..i];
}

These are boring but necessary. The VM spends most of its time extracting numbers and names from text lines. In a real implementation, you'd use binary bytecode and skip all this parsing. We use text because we can read it and debug it.

vm-stack

The stack. An array and a pointer. Push, pop, peek.

var vm_stack: [256]i64 = [_]i64{0} ** 256;
var vm_sp: usize = 0;

fn vm_push(v: i64) void {
    vm_stack[vm_sp] = v;
    vm_sp += 1;
}

fn vm_pop() i64 {
    vm_sp -= 1;
    return vm_stack[vm_sp];
}

vm-locals

Local variables for the VM. Same pattern as the interpreter -- parallel arrays for names and values.

var vm_local_names: [64][]const u8 = undefined;
var vm_local_values: [64]i64 = [_]i64{0} ** 64;
var vm_local_count: usize = 0;

fn vm_get_local(name: []const u8) i64 {
    for (0..vm_local_count) |i| {
        if (streq(vm_local_names[i], name)) {
            return vm_local_values[i];
        }
    }
    return 0;
}

fn vm_set_local(name: []const u8, val: i64) void {
    for (0..vm_local_count) |i| {
        if (streq(vm_local_names[i], name)) {
            vm_local_values[i] = val;
            return;
        }
    }
    vm_local_names[vm_local_count] = name;
    vm_local_values[vm_local_count] = val;
    vm_local_count += 1;
}

We reuse our streq from Problem 16. No standard library, remember?

vm-exec-line

Write vm_exec_line(line) -- execute a single instruction. Start with arithmetic: i64.const N, i64.add, i64.sub, i64.mul, i64.div_s, and comparisons.

Returns true normally, false if we hit something that needs special handling (we'll use this later for control flow).

fn vm_exec_line(raw: []const u8) void {
    const line: []const u8 = vm_trim(raw);

    if (starts_with(line, "i64.const ")) {
        vm_push(parse_int(line, 10));
        return;
    }
    if (streq(line, "i64.add")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(a + b); return; }
    if (streq(line, "i64.sub")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(a - b); return; }
    if (streq(line, "i64.mul")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(a * b); return; }
    if (streq(line, "i64.div_s")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(@divTrunc(a, b)); return; }
    if (streq(line, "i64.gt_s")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a > b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.lt_s")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a < b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.eq")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a == b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.ne")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a != b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.eqz")) { const a: i64 = vm_pop(); vm_push(if (a == 0) @as(i64, 1) else 0); return; }

    if (starts_with(line, "local.get $")) {
        vm_push(vm_get_local(parse_name(line, 0)));
        return;
    }
    if (starts_with(line, "local.set $")) {
        vm_set_local(parse_name(line, 0), vm_pop());
        return;
    }
    if (starts_with(line, "(local $")) {
        // local declaration -- register the name, value starts at 0
        vm_set_local(parse_name(line, 0), 0);
        return;
    }

    // Lines we skip: (func, (param, (result, ), (export, etc.
    // Control flow (if, else, end, block, loop, br, call, return) handled by vm_run
}

Test it by feeding a few lines manually:

    vm_sp = 0;
    vm_exec_line("  i64.const 3");
    vm_exec_line("  i64.const 5");
    vm_exec_line("  i64.add");
    print("vm: {d}\n", .{vm_pop()});   // 8

That's the VM running 3 + 5 = 8 from WAT instructions. Same answer as the interpreter. We're getting somewhere.

vm-control-flow

Now for the hard part: control flow.

When the VM hits if, it pops the stack. If the value is nonzero, execution continues into the then-branch. If zero, we need to skip forward to the matching else or end. This requires scanning forward through lines, counting nesting depth.

Write vm_find_else(from) -- starting from an if line, scan forward to find the matching else at the same depth. Return the line index, or null if there's no else (just end).

Write vm_find_end(from) -- starting from an if, block, or loop line, scan forward to find the matching end.

fn vm_find_end(from: usize) usize {
    var depth: i32 = 1;
    var p: usize = from + 1;
    while (p < vm_line_count) {
        const ln: []const u8 = vm_trim(vm_lines[p]);
        if (streq(ln, "if") or starts_with(ln, "if ") or streq(ln, "block") or streq(ln, "loop")) {
            depth += 1;
        }
        if (streq(ln, "end")) {
            depth -= 1;
            if (depth == 0) {
                return p;
            }
        }
        p += 1;
    }
    return p;
}

fn vm_find_else(from: usize) ?usize {
    var depth: i32 = 1;
    var p: usize = from + 1;
    while (p < vm_line_count) {
        const ln: []const u8 = vm_trim(vm_lines[p]);
        if (streq(ln, "if") or starts_with(ln, "if ") or streq(ln, "block") or streq(ln, "loop")) {
            depth += 1;
        }
        if (streq(ln, "end")) {
            depth -= 1;
            if (depth == 0) {
                return null; // hit end before else
            }
        }
        if (depth == 1 and streq(ln, "else")) {
            return p;
        }
        p += 1;
    }
    return null;
}

These are the most subtle functions in the VM. They count nesting -- every if/block/loop increases depth, every end decreases it. An else only matches our if when the depth is back to 1 (our level). Nested ifs inside our branch have their own else/end pairs that we skip over.

vm-block-loop-br

The block/loop/br/br_if pattern for while loops.

Our compiler emits:

  block          ← outer (br target 1 = exit here)
  loop           ← inner (br target 0 = jump here)
    ;; condition
    i64.eqz
    br_if 1      ← exit the block (go past the outer end)
    ;; body
    br 0         ← jump back to loop (go to inner start)
  end            ← inner end
  end            ← outer end

The VM needs a block stack to track where to jump. When we enter a block or loop, we push its line index and type. br 0 jumps to the innermost entry. br 1 jumps to the next one out.

For block: jumping means going to the end (forward -- exit the block).
For loop: jumping means going back to the loop line (backward -- repeat).

This distinction is crucial. br 0 on a loop goes backward. br 1 on a block goes forward.

const BlockKind = enum { block, loop, ift };

var vm_block_kind: [64]BlockKind = undefined;
var vm_block_target: [64]usize = undefined;   // where to jump
var vm_block_depth: usize = 0;

vm-run

Write vm_run() -- the main execution loop. It uses a program counter pc that steps through lines. Most lines call vm_exec_line. Control flow instructions adjust pc directly.

This is the biggest function in the VM. Let's build it in stages.

var vm_pc: usize = 0;

fn vm_run_at(start_pc: usize, end_pc: usize) i64 {
    vm_pc = start_pc;
    vm_block_depth = 0;

    while (vm_pc < end_pc) {
        const line: []const u8 = vm_trim(vm_lines[vm_pc]);

        // --- Control flow ---

        if (streq(line, "if") or starts_with(line, "if ")) {
            const cond: i64 = vm_pop();
            if (cond != 0) {
                // enter then-branch
                vm_block_kind[vm_block_depth] = .ift;
                vm_block_target[vm_block_depth] = vm_find_end(vm_pc);
                vm_block_depth += 1;
                vm_pc += 1;
                continue;
            } else {
                // skip to else or end
                if (vm_find_else(vm_pc)) |else_pc| {
                    vm_block_kind[vm_block_depth] = .ift;
                    vm_block_target[vm_block_depth] = vm_find_end(vm_pc);
                    vm_block_depth += 1;
                    vm_pc = else_pc + 1;
                    continue;
                } else {
                    vm_pc = vm_find_end(vm_pc) + 1;
                    continue;
                }
            }
        }

        if (streq(line, "else")) {
            // we were in the then-branch and hit else -- skip to end
            if (vm_block_depth > 0) {
                vm_pc = vm_block_target[vm_block_depth - 1] + 1;
                vm_block_depth -= 1;
                continue;
            }
        }

        if (streq(line, "block")) {
            vm_block_kind[vm_block_depth] = .block;
            vm_block_target[vm_block_depth] = vm_find_end(vm_pc);
            vm_block_depth += 1;
            vm_pc += 1;
            continue;
        }

        if (streq(line, "loop")) {
            vm_block_kind[vm_block_depth] = .loop;
            vm_block_target[vm_block_depth] = vm_pc; // loops jump BACK
            vm_block_depth += 1;
            vm_pc += 1;
            continue;
        }

        if (streq(line, "end")) {
            if (vm_block_depth > 0) {
                vm_block_depth -= 1;
            }
            vm_pc += 1;
            continue;
        }

        if (starts_with(line, "br_if ")) {
            const depth: i64 = parse_int(line, 6);
            const cond: i64 = vm_pop();
            if (cond != 0) {
                // branch
                const target_depth: usize = vm_block_depth - 1 - @as(usize, @intCast(depth));
                if (vm_block_kind[target_depth] == .loop) {
                    vm_pc = vm_block_target[target_depth];
                    vm_block_depth = target_depth + 1;
                } else {
                    vm_pc = vm_block_target[target_depth] + 1;
                    vm_block_depth = target_depth;
                }
                continue;
            }
            vm_pc += 1;
            continue;
        }

        if (starts_with(line, "br ")) {
            const depth: i64 = parse_int(line, 3);
            const target_depth: usize = vm_block_depth - 1 - @as(usize, @intCast(depth));
            if (vm_block_kind[target_depth] == .loop) {
                vm_pc = vm_block_target[target_depth];
                vm_block_depth = target_depth + 1;
            } else {
                vm_pc = vm_block_target[target_depth] + 1;
                vm_block_depth = target_depth;
            }
            continue;
        }

        if (streq(line, "return")) {
            return vm_stack[vm_sp - 1];
        }

        // --- Everything else ---
        vm_exec_line(vm_lines[vm_pc]);
        vm_pc += 1;
    }

    // return top of stack if anything is there
    if (vm_sp > 0) {
        return vm_stack[vm_sp - 1];
    }
    return 0;
}

That's a lot of code. Let's break down the control flow logic:

if: Pop the condition. If nonzero, push a block entry and proceed into the then-branch. If zero, scan forward to else (and jump there) or end (and jump past it).

else: We got here by executing the then-branch to completion. Skip to the matching end.

block: Push a block entry. Its target is the end -- branching to a block means exiting forward.

loop: Push a block entry. Its target is the loop line itself -- branching to a loop means jumping backward.

br N: Pop N entries off the block stack. If the target is a loop, jump backward. If it's a block, jump forward past its end.

br_if N: Same as br N, but only if the popped value is nonzero.

end: Pop one block entry and continue.

return: Grab the top of stack and return immediately.

trace-vm-while

Walk through the VM executing a while loop. Here's the WAT for var i: i64 = 0; while (i < 3) { i = i + 1; } i:

  (local $i i64)
  i64.const 0
  local.set $i
  block
  loop
    local.get $i
    i64.const 3
    i64.lt_s
    i64.eqz
    br_if 1
    local.get $i
    i64.const 1
    i64.add
    local.set $i
    br 0
  end
  end
  local.get $i

Trace the first two iterations and the exit. Write down $i, the stack, and pc at each key point.

Iteration 1: $i = 0. Push 0, push 3, lt_s -> 1 (true), eqz -> 0, br_if 1 -- 0 is not nonzero, don't branch. Continue into body. Push 0, push 1, add -> 1, set $i. Now $i = 1. br 0 -- jump back to loop.

Iteration 2: $i = 1. Push 1, push 3, lt_s -> 1, eqz -> 0, br_if 1 -- don't branch. Body: 1 + 1 = 2, set $i. $i = 2. br 0 -- back to loop.

Iteration 3: $i = 2. Same. $i = 3. br 0.

Exit: $i = 3. Push 3, push 3, lt_s -> 0 (false), eqz -> 1, br_if 1 -- 1 is nonzero, branch! Jump to block target + 1 (past the outer end). Push $i -> 3. Result: 3.

The eqz flip is doing exactly what we predicted in Problem 45. Without it, the loop would exit on the first iteration.

vm-scan-functions

Function calls. We need to pre-scan the WAT for (func $name ... lines to know where each function starts and ends, and what its parameters are.

var vm_fn_names: [32][]const u8 = undefined;
var vm_fn_start: [32]usize = undefined;
var vm_fn_end: [32]usize = undefined;
var vm_fn_params: [32][8][]const u8 = undefined;
var vm_fn_param_count: [32]usize = undefined;
var vm_fn_count: usize = 0;

Write vm_scan_functions() -- scan through vm_lines, find each (func $name, extract the name and parameter names, find the matching ) that closes the function.

fn vm_scan_functions() void {
    vm_fn_count = 0;
    var i: usize = 0;
    while (i < vm_line_count) {
        const line: []const u8 = vm_trim(vm_lines[i]);
        if (starts_with(line, "(func $")) {
            const name: []const u8 = parse_name(line, 0);
            vm_fn_names[vm_fn_count] = name;
            vm_fn_start[vm_fn_count] = i + 1; // first line of body

            // extract param names
            var pc: usize = 0;
            var j: usize = i;
            // params are on the same line or nearby
            // scan this line for (param $name
            var p: usize = 0;
            while (p < line.len) {
                if (p + 7 < line.len and starts_with(line[p..], "(param $")) {
                    const pname: []const u8 = parse_name(line, p);
                    vm_fn_params[vm_fn_count][pc] = pname;
                    pc += 1;
                }
                p += 1;
            }
            vm_fn_param_count[vm_fn_count] = pc;

            // find closing )  -- a line that is just ")"
            j = i + 1;
            while (j < vm_line_count) {
                const ln: []const u8 = vm_trim(vm_lines[j]);
                if (streq(ln, ")")) {
                    break;
                }
                j += 1;
            }
            vm_fn_end[vm_fn_count] = j;
            vm_fn_count += 1;
        }

        // Also handle (func (export "main") -- no $name
        if (starts_with(line, "(func (export")) {
            vm_fn_names[vm_fn_count] = "main";
            vm_fn_start[vm_fn_count] = i + 1;
            vm_fn_param_count[vm_fn_count] = 0;
            var j: usize = i + 1;
            while (j < vm_line_count) {
                if (streq(vm_trim(vm_lines[j]), ")")) { break; }
                j += 1;
            }
            vm_fn_end[vm_fn_count] = j;
            vm_fn_count += 1;
        }

        i += 1;
    }
}

vm-function-calls

Add function calls to vm_run_at. When we hit call $name:

1. Find the function in the table
2. Pop arguments from the stack (in reverse -- last arg was pushed last)
3. Save the current state: PC, locals, block depth, stack pointer
4. Set up new locals with the parameter values
5. Run the function body
6. Restore state
7. Push the return value

We need a call stack:

var vm_call_pc: [32]usize = undefined;
var vm_call_sp: [32]usize = undefined;
var vm_call_ln: [32][64][]const u8 = undefined;
var vm_call_lv: [32][64]i64 = undefined;
var vm_call_lc: [32]usize = undefined;
var vm_call_bd: [32]usize = undefined;
var vm_call_depth: usize = 0;

Add to vm_run_at, before the vm_exec_line fallthrough:

        if (starts_with(line, "call $")) {
            const fname: []const u8 = parse_name(line, 0);
            // find function
            var fi: usize = 0;
            while (fi < vm_fn_count) {
                if (streq(vm_fn_names[fi], fname)) { break; }
                fi += 1;
            }
            if (fi < vm_fn_count) {
                // pop args
                var args: [8]i64 = undefined;
                var ai: usize = vm_fn_param_count[fi];
                while (ai > 0) {
                    ai -= 1;
                    args[ai] = vm_pop();
                }
                // save state
                const cd: usize = vm_call_depth;
                vm_call_pc[cd] = vm_pc + 1;
                vm_call_sp[cd] = vm_sp;
                vm_call_lc[cd] = vm_local_count;
                vm_call_bd[cd] = vm_block_depth;
                for (0..vm_local_count) |li| {
                    vm_call_ln[cd][li] = vm_local_names[li];
                    vm_call_lv[cd][li] = vm_local_values[li];
                }
                vm_call_depth += 1;
                // set up params as locals
                vm_local_count = 0;
                for (0..vm_fn_param_count[fi]) |pi| {
                    vm_set_local(vm_fn_params[fi][pi], args[pi]);
                }
                // run function body
                const result: i64 = vm_run_at(vm_fn_start[fi], vm_fn_end[fi]);
                // restore state
                vm_call_depth -= 1;
                vm_sp = vm_call_sp[cd];
                vm_local_count = vm_call_lc[cd];
                vm_block_depth = vm_call_bd[cd];
                for (0..vm_local_count) |li| {
                    vm_local_names[li] = vm_call_ln[cd][li];
                    vm_local_values[li] = vm_call_lv[cd][li];
                }
                // push result
                vm_push(result);
                vm_pc = vm_call_pc[cd];
                continue;
            }
            vm_pc += 1;
            continue;
        }

This is the same save/restore pattern we used in the interpreter's call_fn. Save everything, set up new locals, run the body, restore, push the result. The VM's function call is structurally identical to the interpreter's -- different data, same idea.

vm-run-wat

Write vm_run_wat(wat) -- the top-level entry point. Split lines, scan for functions, find main, run it.

fn vm_run_wat(wat: []const u8) i64 {
    vm_split(wat);
    vm_scan_functions();
    vm_sp = 0;
    vm_local_count = 0;
    vm_block_depth = 0;
    vm_call_depth = 0;

    // find main
    for (0..vm_fn_count) |i| {
        if (streq(vm_fn_names[i], "main")) {
            return vm_run_at(vm_fn_start[i], vm_fn_end[i]);
        }
    }
    return 0;
}

Test it with hand-written WAT:

    const result: i64 = vm_run_wat(
        \\(func (export "main") (result i64)
        \\  i64.const 3
        \\  i64.const 5
        \\  i64.add
        \\)
    );
    print("vm result: {d}\n", .{result});   // 8

If that prints 8, your VM runs WebAssembly. Not all of WebAssembly -- just the subset our compiler produces. But it's enough.

check-both

The moment of truth. Write check_both(src, expected):

1. Run the interpreter on src -> get the interpreter's answer
2. Run the compiler on src -> get WAT text
3. Run the VM on the WAT -> get the VM's answer
4. Compare both answers to expected

fn check_both(input: []const u8, expected: i64) void {
    // Interpret
    source = input;
    pos = 0;
    num_vars = 0;
    fn_count = 0;
    return_flag = false;
    save_depth = 0;
    skip();
    const interp: i64 = program();

    // Compile
    c_program(input);

    // Run VM
    const compiled: i64 = vm_run_wat(out[0..out_len]);

    if (interp == expected and compiled == expected) {
        print("  ok {d}\n", .{expected});
    } else {
        print("  FAIL interp={d} compiled={d} want={d}\n", .{ interp, compiled, expected });
    }
}

agreement-suite

Run the full agreement test suite:

    print("--- Agreement ---\n", .{});
    check_both("3 + 5", 8);
    check_both("2 + 3 * 4", 14);
    check_both("(2 + 3) * 4", 20);
    check_both("100 - 58", 42);
    check_both("7 * 6", 42);
    check_both("5 > 3", 1);
    check_both("2 > 7", 0);
    check_both("42 == 42", 1);
    check_both("var x: i64 = 42; x", 42);
    check_both("var x: i64 = 5; x = x + 1; x", 6);
    check_both("var a: i64 = 10; var b: i64 = 20; a + b", 30);
    check_both("if (5 > 3) { 10 } else { 20 }", 10);
    check_both("if (2 > 7) { 10 } else { 20 }", 20);
    check_both(
        \\var s: i64 = 0; var i: i64 = 1;
        \\while (i < 6) {
        \\    s = s + i;
        \\    i = i + 1;
        \\}
        \\s
    , 15);
    check_both("fn add(a: i64, b: i64) i64 { return a + b; } add(3, 5)", 8);
    check_both("fn double(x: i64) i64 { return x * 2; } double(21)", 42);
    check_both(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0;
        \\    var b: i64 = 1;
        \\    var i: i64 = 0;
        \\    while (i < n) {
        \\        var t: i64 = a + b;
        \\        a = b;
        \\        b = t;
        \\        i = i + 1;
        \\    }
        \\    return b;
        \\}
        \\fib(10)
    , 89);

Eighteen tests. Arithmetic, precedence, parentheses, comparisons, variables, assignment, multiple variables, if/else both branches, while loops, function calls, and Fibonacci.

If every line says ok: two completely independent engines -- the interpreter computing directly from source, and the compiler emitting WAT that a VM executes -- agree on every answer. The compiler is correct.

This is not a proof in the mathematical sense. It's a proof in the engineering sense -- the same sense a test suite is a proof. You've tested every feature, every branch of the compiler, every instruction the VM handles. The surface area is fully covered. If you added a feature (say, modulo) and only the compiler handled it wrong, one of these tests would catch it.

bug-vm-while

You compile a program and get ok for everything except while loops -- they all return 0. You check the WAT output and it looks correct. What's likely wrong in the VM?

The most common bug is getting br_if backwards -- forgetting i64.eqz in the compiler, or mishandling the branch direction in the VM. If br_if 1 always triggers (because eqz was omitted), the loop exits immediately. Another common bug: br 0 jumping to the block instead of the loop -- which exits instead of repeating.

Set a breakpoint in the br_if handler. Check: is the popped value 0 or 1? Does it branch or fall through? Trace one iteration manually. The bug is almost always in the interaction between eqz, br_if, and the block/loop distinction.

design-why-vm

Why build a VM instead of using wat2wasm and Node.js?

Three reasons.

First, speed of testing. The agreement test runs in-process -- compile, execute, compare, all in one zig build-exe && ./lang. No spawning processes, no writing temp files, no reading stdout. A hundred tests run in milliseconds.

Second, understanding. Writing the VM forces you to understand what every WAT instruction does. You can't fake i64.eqz -- you have to implement it. If you got the compiler wrong, the VM often fails in a way that reveals exactly what went wrong. The VM is a debugging tool as much as a verification tool.

Third, self-hosting. Eventually we'll write the compiler in its own language. The compiled output will be WAT. To verify it, we'll need to run it. If the VM is part of our Zig host, we can run the self-hosted compiler's output directly. No external tools in the loop.

What we have now: a complete pipeline. Source text goes into the interpreter and the compiler. The interpreter produces a number. The compiler produces WAT. The VM executes the WAT and produces a number. The numbers match. For fib(10), both say 89.

The interpreter tells you what the program means. The compiler encodes that meaning as instructions. The VM follows those instructions. Agreement -- across all inputs, across all features -- is the proof that the encoding is faithful.

Next up: strings, output buffers, and reading stdin. The features the compiler needs to compile itself.

## Part 8: Feeding the Snake

We have a compiler. It reads source text and emits WAT. We have a VM that runs the WAT and gets the same answers as the interpreter. Everything agrees. So what's left?

The compiler is written in Zig. We want it written in our language. For that, we need to look at what the compiler does and ask: can our language do that?

The compiler does three things:
1. Reads bytes from source text: source[pos], cur(), skip()
2. Compares strings to recognize keywords: streq(word, "var")
3. Writes bytes to an output buffer: emit_byte('('), emit_str("i64.const ")

Our language has integers, variables, if/else, while, functions. It does not have byte-level memory access or string literals. Without those, the compiler can't read its input or write its output.

Time to add them.

### Memory

WebAssembly has linear memory -- a flat byte array that programs can read from and write to. Our language will have the same thing: a big array of bytes, accessed by address.

We'll add two builtin functions:
- load8(addr) -- read one byte from address addr
- store8(addr, val) -- write byte val to address addr

That's the entire memory model. One byte at a time. It's enough to build string comparison, output buffers, and everything else the compiler needs.

interpreter-heap

Add the heap to the interpreter. A global byte array, 1MB:

var heap: [1 << 20]u8 = [_]u8{0} ** (1 << 20);

Add load8 and store8 as builtin functions. When the interpreter sees load8(addr) or store8(addr, val) in factor() (where function calls are handled), it executes them directly instead of looking for a user-defined function.

In the interpreter's call path (inside factor(), where we handle name(), add builtin checks before searching user-defined functions:

    if (streq(name, "load8")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return @as(i64, heap[@intCast(addr)]);
    }
    if (streq(name, "store8")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ',') { pos += 1; skip(); }
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        heap[@intCast(addr)] = @intCast(val);
        return 0;
    }
    if (streq(name, "load64")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const a: usize = @intCast(addr);
        var val: i64 = 0;
        for (0..8) |i| { val = val | (@as(i64, heap[a + i]) << @intCast(i * 8)); }
        return val;
    }
    if (streq(name, "store64")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ',') { pos += 1; skip(); }
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const a: usize = @intCast(addr);
        const v: u64 = @bitCast(val);
        for (0..8) |i| { heap[a + i] = @intCast((v >> @intCast(i * 8)) & 0xff); }
        return 0;
    }

Test:

    check("store8(1000, 65); load8(1000)", 65);
    check("store64(2000, 123456); load64(2000)", 123456);
    check("store64(2000, 0 - 1); load64(2000)", -1);

load8/store8 handle single bytes. load64/store64 handle full 64-bit integers -- 8 bytes, little-endian, just like WebAssembly's i64.load/i64.store. We need both: bytes for character data, 64-bit words for arrays of integers. The self-hosted compiler's function table, parameter lists, and string table all store i64 values.

compiler-memory-ops

Add load8, store8, load64, store64 to the compiler. In c_factor(), where we handle function calls, check for builtins before emitting call $name.

For load8(addr): compile the address, emit i32.wrap_i64, i32.load8_u, i64.extend_i32_u.
For store8(addr, val): compile both, emit the store sequence with a $__tmp local to swap order.
For load64(addr): compile the address, emit i32.wrap_i64, i64.load.
For store64(addr, val): compile both, emit the 64-bit store sequence.

    if (streq(name, "load8")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("i32.wrap_i64");
        emit_op("i32.load8_u");
        emit_op("i64.extend_i32_u");
        return;
    }
    if (streq(name, "store8")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ',') { pos += 1; skip(); }
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        // stack: [addr, val]
        // store8 needs: addr (i32), val (i32)
        emit_str("  i32.wrap_i64\n");  // convert val to i32
        emit_str("  local.set $__tmp\n");
        emit_str("  i32.wrap_i64\n");  // convert addr to i32
        emit_str("  local.get $__tmp\n");
        emit_op("i32.store8");
        emit_const(0); // store8 returns 0
        return;
    }
    if (streq(name, "load64")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("i32.wrap_i64");
        emit_op("i64.load");
        return;
    }
    if (streq(name, "store64")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ',') { pos += 1; skip(); }
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_str("  local.set $__tmp\n");
        emit_str("  i32.wrap_i64\n");
        emit_str("  local.get $__tmp\n");
        emit_op("i64.store");
        emit_const(0);
        return;
    }

We need a scratch local $__tmp for the store operations -- WAT's store instructions expect the address under the value on the stack, but our compiler pushes them in argument order (address first, value second). The temp local lets us swap them. Add (local $__tmp i64) to every function in c_program() -- it's one wasted local but saves us from tracking which functions use store.

This is a little ugly. Real compilers solve this more elegantly. But it works, and working is what matters right now.

vm-memory-ops

Add load8 and store8 to the VM. When vm_exec_line sees these instructions, operate on a VM-side memory array:

var vm_memory: [1 << 20]u8 = [_]u8{0} ** (1 << 20);

    if (streq(line, "i32.load8_u")) {
        const addr: i64 = vm_pop();
        vm_push(@as(i64, vm_memory[@intCast(addr)]));
        return;
    }
    if (streq(line, "i32.store8")) {
        const val: i64 = vm_pop();
        const addr: i64 = vm_pop();
        vm_memory[@intCast(addr)] = @intCast(val);
        return;
    }
    if (streq(line, "i64.load")) {
        const addr: usize = @intCast(vm_pop());
        var val: i64 = 0;
        for (0..8) |i| { val = val | (@as(i64, vm_memory[addr + i]) << @intCast(i * 8)); }
        vm_push(val);
        return;
    }
    if (streq(line, "i64.store")) {
        const val: i64 = vm_pop();
        const addr: usize = @intCast(vm_pop());
        const v: u64 = @bitCast(val);
        for (0..8) |i| { vm_memory[addr + i] = @intCast((v >> @intCast(i * 8)) & 0xff); }
        return;
    }
    if (streq(line, "drop")) {
        _ = vm_pop();
        return;
    }
    if (streq(line, "i32.wrap_i64")) {
        // truncate to 32 bits -- for our purposes, a no-op on small values
        return;
    }
    if (streq(line, "i64.extend_i32_u")) {
        // extend to 64 bits -- also a no-op for us
        return;
    }

Test agreement:

    check_both("store8(1000, 65); load8(1000)", 65);
    check_both("store8(100, 72); store8(101, 105); load8(100) * 256 + load8(101)", 72 * 256 + 105);

Both engines agree. Memory works in the interpreter, the compiler, and the VM.

### Strings

Here's the plan. A string literal like "hello" does two things:
1. Stores the bytes h, e, l, l, o somewhere in memory
2. Evaluates to two values: the address where the bytes start, and the length

We'll use a convention: string data lives at high addresses (starting at 60000), and a string expression evaluates to (addr * 65536 + len) -- the address and length packed into a single i64. We can extract them: addr = val / 65536, len = val % 65536.

(In a real language you'd use two values or a struct. Packing them into one i64 is a hack, but it keeps our language simple -- we don't need tuples or multiple return values.)

string-literals

Add string literals to the interpreter's parser. In factor(), when we see ", read characters until the closing ", store them on the heap starting at address 60000, and return the packed (addr, len) value.

var string_offset: usize = 60000;

In factor():

    if (cur() == '"') {
        pos += 1; // skip opening quote
        const addr: usize = string_offset;
        while (cur() != '"' and cur() != 0) {
            heap[string_offset] = cur();
            string_offset += 1;
            pos += 1;
        }
        if (cur() == '"') { pos += 1; skip(); }
        const len: usize = string_offset - addr;
        return @as(i64, addr) * 65536 + @as(i64, len);
    }

Test:

    check("\"hello\" / 65536", 60000);          // addr = 60000
    check("\"hello\" - \"hello\" / 65536 * 65536", 5);   // len = 5

OK, that's not pretty. Let's add some helpers to make strings usable.

string-builtins

Add str_addr(s) and str_len(s) as builtins that extract the address and length from a packed string value. Add print_str(s) that writes the bytes to stderr via Zig's print.

    if (streq(name, "str_len")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return @rem(val, 65536);
    }
    if (streq(name, "str_addr")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return @divTrunc(val, 65536);
    }
    if (streq(name, "print_str")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const addr: usize = @intCast(@divTrunc(val, 65536));
        const len: usize = @intCast(@rem(val, 65536));
        print("{s}", .{heap[addr..addr + len]});
        return 0;
    }

Now:

    check("str_len(\"hello\")", 5);
    check("str_len(\"\")", 0);
    check("str_addr(\"hello\")", 60000);

streq-in-language

The crucial one: write a streq_mem function in our language that compares two packed string values byte by byte. This is the equivalent of the streq we wrote in Zig -- but running in our language.

    check(
        \\fn streq_mem(a: i64, b: i64) i64 {
        \\    var a_addr: i64 = a / 65536;
        \\    var a_len: i64 = a - a_addr * 65536;
        \\    var b_addr: i64 = b / 65536;
        \\    var b_len: i64 = b - b_addr * 65536;
        \\    if (a_len != b_len) { return 0; }
        \\    var i: i64 = 0;
        \\    while (i < a_len) {
        \\        if (load8(a_addr + i) != load8(b_addr + i)) { return 0; }
        \\        i = i + 1;
        \\    }
        \\    return 1;
        \\}
        \\streq_mem("cat", "cat")
    , 1);

    check(
        \\fn streq_mem(a: i64, b: i64) i64 {
        \\    var a_addr: i64 = a / 65536;
        \\    var a_len: i64 = a - a_addr * 65536;
        \\    var b_addr: i64 = b / 65536;
        \\    var b_len: i64 = b - b_addr * 65536;
        \\    if (a_len != b_len) { return 0; }
        \\    var i: i64 = 0;
        \\    while (i < a_len) {
        \\        if (load8(a_addr + i) != load8(b_addr + i)) { return 0; }
        \\        i = i + 1;
        \\    }
        \\    return 1;
        \\}
        \\streq_mem("cat", "dog")
    , 0);

Stop and look at that. We just wrote string comparison in our own language. load8 reads bytes from memory. The while loop walks both strings. != checks each pair. This is the same logic as our Zig streq, expressed in the language we built. The language is starting to be useful for real work.

compiler-strings

Add string literals to the compiler. When c_factor() sees ", it needs to:

1. Read the string bytes
2. Store them in a string table (a compile-time data structure)
3. Emit i64.const with the packed (addr, len) value

The string table will later become WAT (data ...) sections.

var str_table: [256][]const u8 = undefined;
var str_addrs: [256]usize = undefined;
var str_count: usize = 0;
var str_next_addr: usize = 60000;

In c_factor():

    if (cur() == '"') {
        pos += 1;
        const start: usize = pos;
        while (cur() != '"' and cur() != 0) { pos += 1; }
        const text: []const u8 = source[start..pos];
        if (cur() == '"') { pos += 1; skip(); }

        // Add to string table
        const addr: usize = str_next_addr;
        str_table[str_count] = text;
        str_addrs[str_count] = addr;
        str_count += 1;
        str_next_addr += text.len;

        const packed: i64 = @as(i64, @intCast(addr)) * 65536 + @as(i64, @intCast(text.len));
        emit_const(packed);
        return;
    }

string-data-sections

Emit string data sections at the start of c_program(). After all function definitions and before the main function, emit:

(memory (export "memory") 1)
(data (i32.const 60000) "hello")
(data (i32.const 60005) "world")

Each string gets a (data ...) section that writes its bytes into linear memory at the right address.

fn emit_string_table() void {
    emit_str("(memory (export \"memory\") 1)\n");
    for (0..str_count) |i| {
        emit_str("(data (i32.const ");
        emit_num(@intCast(str_addrs[i]));
        emit_str(") \"");
        // emit bytes, escaping non-printable chars
        for (str_table[i]) |c| {
            if (c >= 32 and c < 127 and c != '"' and c != '\\') {
                emit_byte(c);
            } else {
                emit_byte('\\');
                emit_byte("0123456789abcdef"[c >> 4]);
                emit_byte("0123456789abcdef"[c & 0xf]);
            }
        }
        emit_str("\")\n");
    }
}

Call emit_string_table() from c_program() after emitting function definitions.

compiler-string-builtins

Add string builtins to the compiler. str_len, str_addr, and print_str need WAT implementations.

str_len(s) compiles to: s, then i64.const 65536, then i64.rem_s.
str_addr(s) compiles to: s, then i64.const 65536, then i64.div_s.

For print_str, we'll import a host function from the runtime:

(import "env" "print_str" (func $print_str (param i32) (param i32)))

The import goes at the top of the module. The compiler decomposes the packed string into (addr, len) and calls the imported function.

vm-data-sections

Update the VM to handle data sections and the print_str import.

When the VM sees (data (i32.const ADDR) "..."), it copies the string bytes into vm_memory at the given address.

When the VM hits call $print_str, it pops addr and len, reads bytes from vm_memory, and prints them.

fn vm_load_data() void {
    for (0..vm_line_count) |i| {
        const line: []const u8 = vm_trim(vm_lines[i]);
        if (starts_with(line, "(data (i32.const ")) {
            const addr_start: usize = 17; // after "(data (i32.const "
            const addr: usize = @as(usize, @intCast(parse_int(line, addr_start)));
            // find the string content between quotes
            var q1: usize = 0;
            while (q1 < line.len and line[q1] != '"') { q1 += 1; }
            q1 += 1; // skip first quote
            var q2: usize = q1;
            while (q2 < line.len and line[q2] != '"') {
                if (line[q2] == '\\' and q2 + 2 < line.len) {
                    // hex escape
                    const hi: u8 = hex_val(line[q2 + 1]);
                    const lo: u8 = hex_val(line[q2 + 2]);
                    vm_memory[addr + (q2 - q1)] = hi * 16 + lo;
                    q2 += 3;
                } else {
                    vm_memory[addr + (q2 - q1)] = line[q2];
                    q2 += 1;
                }
            }
        }
    }
}

fn hex_val(c: u8) u8 {
    if (c >= '0' and c <= '9') { return c - '0'; }
    if (c >= 'a' and c <= 'f') { return c - 'a' + 10; }
    return 0;
}

Call vm_load_data() from vm_run_wat() after vm_scan_functions().

agreement-strings

Agreement test for strings:

    check_both("str_len(\"hello\")", 5);
    check_both("str_len(\"\")", 0);
    check_both("store8(100, 65); load8(100)", 65);

And the big one -- streq_mem running through both engines:

    check_both(
        \\fn streq_mem(a: i64, b: i64) i64 {
        \\    var a_addr: i64 = a / 65536;
        \\    var a_len: i64 = a - a_addr * 65536;
        \\    var b_addr: i64 = b / 65536;
        \\    var b_len: i64 = b - b_addr * 65536;
        \\    if (a_len != b_len) { return 0; }
        \\    var i: i64 = 0;
        \\    while (i < a_len) {
        \\        if (load8(a_addr + i) != load8(b_addr + i)) { return 0; }
        \\        i = i + 1;
        \\    }
        \\    return 1;
        \\}
        \\streq_mem("hello", "hello")
    , 1);

If this passes -- a string comparison function, written in our language, compiled to WAT, executed by the VM, producing the same result as the interpreter -- then the language is powerful enough to do real string processing. It can compare keywords. It can check operators. It can parse.

### The taste of self-hosting

Let's write a tiny compiler in our language. Not the full thing -- just enough to compile 3 + 5 into WAT. A proof that the snake's mouth is reaching its tail.

emit-wat-in-language

Write a program in our language that outputs WAT for a hardcoded expression:

    check(
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_len, b);
        \\    out_len = out_len + 1;
        \\    return 0;
        \\}
        \\fn emit_s(s: i64) i64 {
        \\    var addr: i64 = s / 65536;
        \\    var len: i64 = s - addr * 65536;
        \\    var i: i64 = 0;
        \\    while (i < len) {
        \\        emit_byte(load8(addr + i));
        \\        i = i + 1;
        \\    }
        \\    return 0;
        \\}
        \\var out_len: i64 = 0;
        \\emit_s("  i64.const 3\n");
        \\emit_s("  i64.const 5\n");
        \\emit_s("  i64.add\n");
        \\out_len
    , 39);

That program writes WAT instructions into memory starting at address 50000. The emit_byte and emit_s functions are the compiler's output mechanism -- the same emit_byte and emit_str we wrote in Zig, now expressed in our own language.

The test checks that 39 bytes were written (three lines of WAT). To verify the content, we could read back the bytes with load8 -- or we could trust the string tests and move on.

c-number-in-language

Now the real thing. Write a program in our language that:
1. Takes a character from memory (simulating input)
2. Recognizes a digit
3. Emits i64.const N

This is c_number() -- in our language:

    check(
        \\var src_pos: i64 = 0;
        \\var out_len: i64 = 0;
        \\fn cur() i64 { return load8(src_pos); }
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_len, b);
        \\    out_len = out_len + 1;
        \\    return 0;
        \\}
        \\fn emit_s(s: i64) i64 {
        \\    var a: i64 = s / 65536;
        \\    var l: i64 = s - a * 65536;
        \\    var i: i64 = 0;
        \\    while (i < l) { emit_byte(load8(a + i)); i = i + 1; }
        \\    return 0;
        \\}
        \\fn emit_digit(d: i64) i64 {
        \\    emit_byte(48 + d);
        \\    return 0;
        \\}
        \\fn c_number() i64 {
        \\    var val: i64 = 0;
        \\    while (cur() >= 48) {
        \\        if (cur() > 57) { return val; }
        \\        val = val * 10 + (cur() - 48);
        \\        src_pos = src_pos + 1;
        \\    }
        \\    emit_s("  i64.const ");
        \\    emit_digit(val);
        \\    emit_byte(10);
        \\    return val;
        \\}
        \\store8(0, 55);
        \\c_number()
    , 7);

We stored 55 (the ASCII code for '7') at address 0, then called c_number(). It read the digit, emitted i64.const 7\n to the output buffer at address 50000, and returned 7.

That's the compiler's number parser, running in our language, reading from memory, writing to memory. It works.

design-whats-missing

What's still missing for full self-hosting?

Three things:

1. Multi-digit emit_num -- our emit_digit only handles single digits. We need a recursive or loop-based version that prints arbitrary numbers as decimal. This is the emit_num from Problem 30, rewritten in our language.

2. Stdin reading -- the self-hosted compiler needs to read source code from stdin into memory. In WAT, this would be an imported function: (import "env" "read_stdin" (func $read_stdin (result i32))) that fills a memory buffer.

3. The rest of the parser -- c_factor, c_term, c_expression, c_stmt, all the keyword comparisons using streq_mem. Each is a direct translation from Zig to our language. The structure is identical; only the syntax changes.

None of these require new language features. Everything the compiler does -- character access, string comparison, output buffering, recursive descent parsing -- is expressible with what we already have: integers, memory read/write, if/else, while, and functions.

The language is big enough. The snake can reach its tail.

emit-num-in-language

Write emit_num in our language -- a function that prints an arbitrary i64 as decimal digits into the output buffer:

    check(
        \\var out_len: i64 = 0;
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_len, b);
        \\    out_len = out_len + 1;
        \\    return 0;
        \\}
        \\fn emit_num(n: i64) i64 {
        \\    if (n < 0) {
        \\        emit_byte(45);
        \\        emit_num(0 - n);
        \\        return 0;
        \\    }
        \\    if (n > 9) {
        \\        emit_num(n / 10);
        \\    }
        \\    emit_byte(48 + n - n / 10 * 10);
        \\    return 0;
        \\}
        \\emit_num(12345);
        \\out_len
    , 5);

Five bytes written: '1', '2', '3', '4', '5'. The function is recursive -- it divides by 10 to get the higher digits first, then emits the remainder. n - n / 10 * 10 is modulo without the % operator (which our language doesn't have, keeping things simple for the self-hosted compiler to handle).

That's the last piece of the emit puzzle. emit_byte, emit_s, and emit_num -- in our language -- can write any WAT instruction the compiler produces.

We've crossed a threshold. The language now has everything the compiler needs:

| Compiler needs | Language feature |
|---------------|-----------------|
| Read source bytes | load8(addr) |
| Check character class | cur() >= 48 (is digit?) |
| Compare keywords | streq_mem(word, "var") |
| Write WAT output | emit_byte, emit_s, emit_num |
| Recursive descent | Functions with return |
| Track position | var pos: i64 = 0; |
| Skip/scan blocks | while with brace counting |

Every function in our Zig compiler -- c_factor, c_term, c_expression, c_stmt, c_program -- uses only these features. The translation from Zig to our language is mechanical: change the syntax, replace character literals with ASCII codes, use load8/store8 instead of array indexing.

The self-hosted compiler is within reach. It's a big program -- several hundred lines in our language -- but it's not a fundamentally different program. It's the same compiler we already wrote, in a different language. The one we built.

bug-globals-revealed

Run this test:

    check("var x: i64 = 42; fn get() i64 { return x; } get()", 42);

It prints FAIL got=0 want=42. The function get() can't see x.

Why? Look at call_fn, line by line. When we call get():

1. Save all variables: save_vn[0][0] = "x", save_vv[0][0] = 42
2. Set num_vars = 0 -- wipe the entire variable table
3. Bind parameters (none for get)
4. Execute body: return x; -> getVar("x") -> no variables! Returns 0.
5. Restore: put x = 42 back

The function can't see x because call_fn hides it. Every function starts with a blank slate -- it can only see its own parameters and local variables. That was fine for fib(10) where everything is passed as parameters. But the self-hosted compiler has global state -- src_pos, out_pos, src_len -- shared across dozens of functions. Without globals, the compiler is impossible.

We need two kinds of variables: locals (saved and restored on each function call, like now) and globals (visible everywhere, never saved or restored).

global-storage

Add global variable storage to the interpreter:

var global_names: [256][]const u8 = undefined;
var global_values: [256]i64 = undefined;
var num_globals: usize = 0;
var in_function: bool = false;

Write setGlobal(name, val) and getGlobal(name):

fn setGlobal(name: []const u8, val: i64) void {
    for (0..num_globals) |i| {
        if (streq(global_names[i], name)) {
            global_values[i] = val;
            return;
        }
    }
    global_names[num_globals] = name;
    global_values[num_globals] = val;
    num_globals += 1;
}

fn getGlobal(name: []const u8) i64 {
    for (0..num_globals) |i| {
        if (streq(global_names[i], name)) {
            return global_values[i];
        }
    }
    return 0;
}

getvar-checks-globals

Update getVar to check locals first, then globals:

fn getVar(name: []const u8) i64 {
    // Check locals first (innermost scope wins)
    var i: usize = num_vars;
    while (i > 0) {
        i -= 1;
        if (streq(var_names[i], name)) {
            return var_values[i];
        }
    }
    // Then globals
    return getGlobal(name);
}

Update setVar the same way -- check locals, then globals, then create a new local:

fn setVar(name: []const u8, val: i64) void {
    // Check locals first
    for (0..num_vars) |i| {
        if (streq(var_names[i], name)) {
            var_values[i] = val;
            return;
        }
    }
    // Check globals
    for (0..num_globals) |i| {
        if (streq(global_names[i], name)) {
            global_values[i] = val;
            return;
        }
    }
    // Create new local
    var_names[num_vars] = name;
    var_values[num_vars] = val;
    num_vars += 1;
}

Now setVar finds existing variables in either scope. If x is a global, assigning x = 99 inside a function updates the global. If y is a local, it updates the local. If neither exists, it creates a new local.

exec-stmt-globals

Update exec_stmt to create globals at the top level. The var handler:

        if (streq(word, "var") or streq(word, "const")) {
            const vname: []const u8 = read_name();
            if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
            if (cur() == '=') { pos += 1; skip(); }
            const val: i64 = expression();
            if (cur() == ';') { pos += 1; skip(); }
            if (!in_function) {
                setGlobal(vname, val);
            } else {
                setVar(vname, val);
            }
            return val;
        }

And set in_function = true in call_fn, restore it after:

fn call_fn(name: []const u8, args: *const [8]i64, ac: usize) i64 {
    _ = ac;
    // ... find function ...
    // Save caller's LOCAL state only
    const d: usize = save_depth;
    save_nv[d] = num_vars;
    const saved_pos: usize = pos;
    const was_in_function: bool = in_function;
    for (0..num_vars) |i| {
        save_vn[d][i] = var_names[i];
        save_vv[d][i] = var_values[i];
    }
    save_depth += 1;
    num_vars = 0;
    in_function = true;
    // Bind parameters as locals
    for (0..fn_pcounts[fi]) |i| {
        setVar(fn_params[fi][i], args[i]);
    }
    // Execute
    pos = fn_starts[fi];
    return_flag = false;
    const result: i64 = exec_block();
    // Restore locals only -- globals untouched
    save_depth -= 1;
    pos = saved_pos;
    num_vars = save_nv[d];
    in_function = was_in_function;
    for (0..num_vars) |i| {
        var_names[i] = save_vn[d][i];
        var_values[i] = save_vv[d][i];
    }
    return_flag = false;
    return result;
}

The key line: globals are never saved or restored. They survive function calls. Locals are saved on entry and restored on exit, exactly as before.

check-resets-globals

Update check to reset globals:

fn check(input: []const u8, expected: i64) void {
    source = input;
    pos = 0;
    num_vars = 0;
    num_globals = 0;   // ← add this
    fn_count = 0;
    return_flag = false;
    save_depth = 0;
    in_function = false;  // ← add this
    skip();
    const got: i64 = program();
    // ...
}

Now test:

    check("var x: i64 = 42; fn get() i64 { return x; } get()", 42);

ok 42. The function sees the global.

test-global-mutation

Test global mutation:

    check("var x: i64 = 0; fn inc() i64 { x = x + 1; return x; } inc(); inc(); inc()", 3);

Each call to inc() reads x, adds 1, stores it back -- in the global table. After three calls, x is 3.

    check(
        \\var counter: i64 = 0;
        \\fn bump() i64 {
        \\    counter = counter + 1;
        \\    return counter;
        \\}
        \\bump();
        \\bump();
        \\bump();
        \\counter
    , 3);

This is the pattern the self-hosted compiler uses: src_pos, out_pos, and src_len are globals, modified by scanner and emitter functions, visible everywhere.

test-emit-num-globals

Test that emit_num recursion works with globals. This was broken before -- the recursive call would save and restore out_pos, undoing the inner call's writes.

    check(
        \\var out_pos: i64 = 0;
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_pos, b);
        \\    out_pos = out_pos + 1;
        \\    return 0;
        \\}
        \\fn emit_num(n: i64) i64 {
        \\    if (n > 9) { emit_num(n / 10); }
        \\    emit_byte(48 + n - n / 10 * 10);
        \\    return 0;
        \\}
        \\emit_num(12345);
        \\out_pos
    , 5);

Five bytes written. emit_num(12345) recurses four levels deep. Each level calls emit_byte which increments out_pos. Because out_pos is a global, the increment persists through all the recursive calls.

With the old (broken) scoping, each call_fn would save out_pos and restore it on return. The inner calls would write bytes, but out_pos would reset to 0 after each return. Total bytes written: 1 (only the last digit). Now it's 5. That's the fix.

compiler-globals

Add globals to the compiler. We need:

1. A list of global variable names (detected during the first pass)
2. (global $name (mut i64) (i64.const 0)) declarations in the output
3. global.get $name / global.set $name instead of local.get / local.set

Add a global tracking table:

var c_global_names: [64][]const u8 = undefined;
var c_global_count: usize = 0;

fn c_is_global(name: []const u8) bool {
    for (0..c_global_count) |i| {
        if (streq(c_global_names[i], name)) {
            return true;
        }
    }
    return false;
}

compiler-scan-globals

Scan for globals in c_program(). During the first pass (which finds function definitions), also collect top-level var declarations:

At the top of c_program(), before scanning for functions:

    // Scan for top-level globals
    c_global_count = 0;
    source = input;
    pos = 0;
    skip();
    var brace_depth: i32 = 0;
    while (pos < source.len) {
        if (source[pos] == '{') { brace_depth += 1; pos += 1; continue; }
        if (source[pos] == '}') { brace_depth -= 1; pos += 1; continue; }
        if (brace_depth == 0 and is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vn: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                c_global_names[c_global_count] = vn;
                c_global_count += 1;
            } else if (streq(w, "fn")) {
                // skip entire fn (will be handled in function pass)
                while (cur() != '{' and cur() != 0) { pos += 1; }
                if (cur() == '{') { skip_block(); }
            } else {
                pos = s;
                pos += 1;
            }
        } else {
            pos += 1;
        }
    }

Then emit the global declarations:

    for (0..c_global_count) |i| {
        emit_str("(global $");
        emit_str(c_global_names[i]);
        emit_str(" (mut i64) (i64.const 0))\n");
    }

compiler-global-get-set

Update c_factor() to use global.get for globals:

Where we emit local.get $name, change to:

    if (c_is_global(name)) {
        emit_str("  global.get $");
    } else {
        emit_str("  local.get $");
    }
    emit_str(name);
    emit_byte('\n');

Update c_stmt() for var declarations and assignment -- use global.set for globals:

    // In the var handler:
    c_expression();
    if (c_is_global(vname)) {
        emit_str("  global.set $");
    } else {
        emit_str("  local.set $");
    }
    emit_str(vname);
    emit_byte('\n');

    // In the assignment handler:
    c_expression();
    if (c_is_global(word)) {
        emit_str("  global.set $");
    } else {
        emit_str("  local.set $");
    }
    emit_str(word);
    emit_byte('\n');

Also update scan_for_locals to skip globals -- they shouldn't appear as (local ...) declarations:

    if (!dup and !c_is_global(vname)) {
        local_names[local_count] = vname;
        local_count += 1;
    }

vm-globals

Add globals to the VM:

var vm_global_names: [64][]const u8 = undefined;
var vm_global_values: [64]i64 = undefined;
var vm_global_count: usize = 0;

fn vm_get_global(name: []const u8) i64 {
    for (0..vm_global_count) |i| {
        if (streq(vm_global_names[i], name)) {
            return vm_global_values[i];
        }
    }
    return 0;
}

fn vm_set_global(name: []const u8, val: i64) void {
    for (0..vm_global_count) |i| {
        if (streq(vm_global_names[i], name)) {
            vm_global_values[i] = val;
            return;
        }
    }
    vm_global_names[vm_global_count] = name;
    vm_global_values[vm_global_count] = val;
    vm_global_count += 1;
}

Add instruction handling in vm_exec_line:

    if (starts_with(line, "global.get $")) {
        vm_push(vm_get_global(parse_name(line, 0)));
        return;
    }
    if (starts_with(line, "global.set $")) {
        vm_set_global(parse_name(line, 0), vm_pop());
        return;
    }

And in vm_scan_functions (or a new vm_scan_globals), parse (global $name ...) declarations:

    if (starts_with(line, "(global $")) {
        vm_set_global(parse_name(line, 0), 0);
    }

Critically, globals are NOT saved/restored in the VM's function call handler. The call handler saves vm_local_names/vm_local_values/vm_local_count only. Globals persist.

Reset in vm_run_wat:

    vm_global_count = 0;

agreement-globals

Agreement tests with globals:

    check_both("var x: i64 = 42; fn get() i64 { return x; } get()", 42);
    check_both("var x: i64 = 0; fn inc() i64 { x = x + 1; return x; } inc(); inc(); inc()", 3);
    check_both(
        \\var counter: i64 = 0;
        \\fn bump() i64 {
        \\    counter = counter + 1;
        \\    return counter;
        \\}
        \\bump();
        \\bump();
        \\bump();
        \\counter
    , 3);

Three tests. Two engines. Same answers. Globals work in the interpreter, the compiler, and the VM.

test-compiler-pattern

The real test -- the self-hosted compiler pattern. Multiple globals, multiple functions accessing them:

    check_both(
        \\var src_pos: i64 = 0;
        \\var out_pos: i64 = 0;
        \\fn cur() i64 {
        \\    return load8(src_pos);
        \\}
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_pos, b);
        \\    out_pos = out_pos + 1;
        \\    return 0;
        \\}
        \\fn skip() i64 {
        \\    while (cur() == 32) {
        \\        src_pos = src_pos + 1;
        \\    }
        \\    return 0;
        \\}
        \\store8(0, 32);
        \\store8(1, 32);
        \\store8(2, 55);
        \\skip();
        \\emit_byte(cur());
        \\out_pos
    , 1);

This simulates the self-hosted compiler: source bytes at address 0 (two spaces then '7'), skip() advances src_pos past the spaces, cur() reads the '7' (55), emit_byte writes it to the output buffer. All functions share src_pos and out_pos through global variables.

If this passes in both engines: the foundational mechanism of the self-hosted compiler -- shared global state across scanner and emitter functions -- works correctly.

bug-variable-shadowing

This program gives different results in the interpreter vs what you'd expect. Why?

var x: i64 = 10;
fn foo() i64 {
    var x: i64 = 99;
    return x;
}
foo()

It returns 99, not 10. Inside foo(), var x: i64 = 99; creates a local variable named x (because we're inside a function). return x; finds the local x first (locals are checked before globals). The global x is shadowed but not modified.

This is correct behavior -- local variables shadow globals with the same name. It's how most languages work. But it's a trap if you accidentally reuse a name. Our language doesn't warn about shadowing. Something to keep in mind when writing the self-hosted compiler: don't name a local the same as a global.

Globals are the single biggest feature needed for self-hosting. Every function in the self-hosted compiler -- cur(), skip(), number(), read_name(), emit_byte(), emit_s(), emit_num(), every c_* function -- reads or writes global state. Without globals, none of them can communicate. With globals, they all share the same src_pos, out_pos, and src_len, just like the Zig versions share Zig globals.

The pattern is now proven: declare globals at the top level, access them from any function, mutations are visible everywhere. The interpreter, compiler, and VM all agree. We can move on.

## Part 10: Completing the Language

Our language is a subset of Zig. Every program we write should compile with both our compiler and the Zig compiler. This chapter adds the remaining operators and features to close the gap.

### Comparison operators

interp-ge-le

Add >= and <= to the interpreter's expression(). Check for two-character operators before single-character ones. >= returns 1 if left >= right, else 0.

compiler-ge-le

Add >= and <= to the compiler (i64.ge_s, i64.le_s) and VM. Agreement tests:

    check_both("5 >= 5", 1);
    check_both("5 >= 6", 0);
    check_both("3 <= 3", 1);
    check_both("3 <= 2", 0);

### Logical operators

interp-and-or

Add and and or to the interpreter. These are new precedence levels -- or binds loosest, and binds tighter, both looser than comparisons. Rename expression() to comparison(). Write and_expr() and or_expr(). The new expression() calls or_expr():

expression -> or_expr -> and_expr -> comparison -> add_sub -> term -> factor

and: both sides nonzero -> 1, else 0. or: either side nonzero -> 1, else 0. Not short-circuiting -- both sides always evaluate.

compiler-and-or

Add and/or to the compiler. Normalize each side to 0/1 with i64.const 0 / i64.ne, then use bitwise i64.and / i64.or. Add these to the VM. Agreement tests:

    check_both("1 and 1", 1);
    check_both("1 and 0", 0);
    check_both("0 or 1", 1);
    check_both("5 > 3 and 10 > 7", 1);
    check_both("5 >= 5 and 5 <= 5", 1);

### Character literals

char-literals

Add character literals to the interpreter and compiler. In factor() / c_factor(), when we see a single quote, read the byte inside and return/emit its integer value. 'a' is 97, '0' is 48. Agreement tests:

    check_both("'a'", 97);
    check_both("'0'", 48);
    check_both("'z' - 'a'", 25);

In the self-hosted compiler, detect single quotes with cur() == 39 (since character literals don't exist yet during bootstrap).

### Modulo

modulo-operator

Add % to term() and c_term(), same precedence as * and /. Maps to i64.rem_s in WAT. Add i64.rem_s to the VM.

    check_both("10 % 3", 1);
    check_both("17 % 5", 2);

### Compound assignment

compound-assignment

Add +=, -=, *= to exec_stmt() and c_stmt(). When we see name followed by +=, desugar to name = name + expr.

Interpreter -- after reading a name, before the regular = check:

        if (cur() == '+' and pos + 1 < source.len and source[pos + 1] == '=') {
            pos += 2; skip();
            const val: i64 = expression();
            if (cur() == ';') { pos += 1; skip(); }
            setVar(word, getVar(word) + val);
            return getVar(word);
        }

Compiler -- emit get, compile right side, emit add, emit set:

        if (cur() == '+' and pos + 1 < source.len and source[pos + 1] == '=') {
            pos += 2; skip();
            if (c_is_global(word)) { emit_str("  global.get $"); } else { emit_str("  local.get $"); }
            emit_str(word); emit_byte('\n');
            c_expression();
            emit_op("i64.add");
            if (c_is_global(word)) { emit_str("  global.set $"); } else { emit_str("  local.set $"); }
            emit_str(word); emit_byte('\n');
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

Same pattern for -= (with sub) and *= (with mul).

    check_both("var x: i64 = 10; x += 5; x", 15);
    check_both("var x: i64 = 10; x -= 3; x", 7);
    check_both("var x: i64 = 10; x *= 2; x", 20);

191 occurrences of += 1 in the Zig code. After this, every one compiles in both languages.

### Break

break-statement

Add break to the interpreter with a global break_flag (like return_flag). Add to the compiler as emit_op("br 1") -- unconditional exit from the block/loop pair. The VM already handles br 1.

    check_both(
        \\var i: i64 = 0;
        \\while (i < 100) {
        \\    if (i == 5) { break; }
        \\    i += 1;
        \\}
        \\i
    , 5);

### For loops (range only)

We support for (start..end) |name| { body } -- Zig's range-based for loop. This covers all 33 for loops in our code. We do NOT support array iteration (for (array) |item|).

for-loop-interpreter

Add for to the interpreter:

        if (streq(word, "for")) {
            if (cur() == '(') { pos += 1; skip(); }
            const start_val: i64 = expression();
            if (cur() == '.') { pos += 1; }
            if (cur() == '.') { pos += 1; }
            skip();
            const end_val: i64 = expression();
            if (cur() == ')') { pos += 1; skip(); }
            if (cur() == '|') { pos += 1; skip(); }
            const loop_var: []const u8 = read_name();
            if (cur() == '|') { pos += 1; skip(); }

            const body_pos: usize = pos;
            var i: i64 = start_val;
            while (i < end_val) {
                pos = body_pos;
                setVar(loop_var, i);
                _ = exec_block();
                if (return_flag or break_flag) { break; }
                i += 1;
            }
            if (break_flag) { break_flag = false; }
            if (pos <= body_pos) { pos = body_pos; skip_block(); }
            return 0;
        }

The two dots are parsed as two separate . characters. The |name| capture is the pipe characters with a variable name between them. Inside the loop, name is set as a local variable for each iteration.

for-loop-compiler

Add for to the compiler. Emit block/loop/br_if/br with auto-init and auto-increment. Store the end bound in $__for_end:

        if (streq(word, "for")) {
            if (cur() == '(') { pos += 1; skip(); }
            c_expression();          // start
            if (cur() == '.') { pos += 1; }
            if (cur() == '.') { pos += 1; }
            skip();
            c_expression();          // end
            emit_str("  local.set $__for_end\n");
            if (cur() == ')') { pos += 1; skip(); }
            if (cur() == '|') { pos += 1; skip(); }
            const loop_var: []const u8 = read_name();
            if (cur() == '|') { pos += 1; skip(); }
            // init
            emit_str("  local.set $"); emit_str(loop_var); emit_byte('\n');
            emit_op("block");
            emit_op("loop");
            // condition: i >= end -> exit
            emit_str("  local.get $"); emit_str(loop_var); emit_byte('\n');
            emit_str("  local.get $__for_end\n");
            emit_op("i64.ge_s");
            emit_op("br_if 1");
            c_block();
            emit_op("drop");
            // increment
            emit_str("  local.get $"); emit_str(loop_var); emit_byte('\n');
            emit_op("i64.const 1");
            emit_op("i64.add");
            emit_str("  local.set $"); emit_str(loop_var); emit_byte('\n');
            emit_op("br 0");
            emit_op("end");
            emit_op("end");
            return false;
        }

Add (local $__for_end i64) to emit_locals(). Update scan_for_locals to detect |name| in for statements and register the capture as a local.

Agreement tests:

    check_both("var s: i64 = 0; for (0..5) |i| { s += i; } s", 10);
    check_both("var s: i64 = 0; for (0..3) |i| { for (0..4) |j| { s += 1; } } s", 12);

### true and false

true-false-keywords

Add true and false as keywords in factor() and c_factor(). After reading a name, check before treating it as a variable:

    if (streq(name, "true")) { return 1; }
    if (streq(name, "false")) { return 0; }

In the compiler, emit i64.const 1 or i64.const 0. In Zig, these are bool values. In our language, they're the integers 1 and 0. The behavior is identical when used with and, or, and if.

    check_both("true", 1);
    check_both("false", 0);
    check_both("true and false", 0);
    check_both("true or false", 1);

### Arrays

array-declarations

Array declarations. When the var handler sees : [N] in the type position, allocate N x 8 bytes from the heap (starting at address 300000). The variable's value is the base address. Parse and accept = undefined (ignore it -- our heap is zeroed). Accept any type name after ] (not just i64 -- also usize, bool, etc., all treated as i64).

Track which variables are arrays with a parallel flag. Set during declaration, check during indexing.

array-read

Array read. In factor(), when we see name[, check if it's an array variable. If yes: load64(base + idx * 8). If no: packed string byte access load8(addr + idx).

array-write

Array write. In exec_stmt(), when we see name[, parse index, ], =, value. Write with store64(base + idx * 8, val). Also handle arr[i] += val.

compiler-arrays

Add arrays to the compiler. Array globals get pre-assigned addresses from a counter (c_next_array). Track array names. arr[i] read emits: get base, index * 8, add, i32.wrap_i64, i64.load. arr[i] = val write emits the store sequence with $__tmp.

agreement-arrays

Agreement tests:

    check_both(
        \\var arr: [10]i64 = undefined;
        \\arr[0] = 42; arr[1] = 99;
        \\arr[0] + arr[1]
    , 141);

payoff-var-storage

The payoff -- variable storage from the Zig interpreter, now valid in both languages:

var var_names: [256]i64 = undefined;
var var_values: [256]i64 = undefined;
var num_vars: i64 = 0;

fn setVar(name: i64, val: i64) i64 {
    for (0..num_vars) |i| {
        if (streq_mem(var_names[i], name) == 1) {
            var_values[i] = val;
            return 0;
        }
    }
    var_names[num_vars] = name;
    var_values[num_vars] = val;
    num_vars += 1;
    return 0;
}

Same arrays, same for loop, same +=, same indexing. Compiles under both.

design-not-supported

What Zig features do we deliberately NOT support?

Array iteration (for (array) |item|): our arrays don't carry length. Use for (0..count) |i| { arr[i] ... }.

@intCast / @as: type casting. Not needed -- everything is i64.

Pointers: restructure shared code to pass values, not addresses.

Complex initializers ([_]i64{0} ** 256): our arrays zero-initialize via the heap. Accept = undefined.

The rule: add a feature only when avoiding it would change the code's structure, not merely its spelling.

## Part 11: The Compiler, in Its Own Language

We've proven every piece individually. streq_mem compares strings. emit_byte, emit_s, emit_num write output. The scanner reads source bytes. Now we assemble them into a complete compiler -- written in the language we built, using every feature we added in Part 10.

Every function here compiles under both our compiler and the Zig compiler. That's the test: one source, two compilers, same result.

### I/O

io-builtins

Add read_input and print_char to the interpreter, compiler, and VM.

In the interpreter: read_input() copies the source into heap at address 0 and returns the length. print_char(c) writes one byte to output.

In the compiler: these become imported WAT functions. Add to c_program():

    emit_str("(import \"env\" \"print_char\" (func $print_char (param i64)))\n");
    emit_str("(import \"env\" \"read_input\" (func $read_input (result i64)))\n");
    emit_str("(memory (export \"memory\") 10)\n");

In the VM: call $print_char pops and prints the byte. call $read_input loads test data into vm_memory and pushes the length.

### The scanner

self-char-classification

Character classification. Now that we have and, or, >=, <=, and character literals, these are clean single-line conditions:

fn is_space(c: i64) i64 {
    if (c == ' ' or c == '\n' or c == '\r' or c == '\t') { return 1; }
    return 0;
}

fn is_letter(c: i64) i64 {
    if (c >= 'a' and c <= 'z') { return 1; }
    if (c >= 'A' and c <= 'Z') { return 1; }
    if (c == '_') { return 1; }
    return 0;
}

fn is_digit(c: i64) i64 {
    if (c >= '0' and c <= '9') { return 1; }
    return 0;
}

fn is_alnum(c: i64) i64 {
    if (is_letter(c) == 1 or is_digit(c) == 1) { return 1; }
    return 0;
}

Compare with the Zig interpreter's versions -- they're nearly identical. is_digit is c >= '0' and c <= '9' in both languages now.

self-scanner

The core scanner:

var src_pos: i64 = 0;
var src_len: i64 = 0;

fn cur() i64 {
    if (src_pos >= src_len) { return 0; }
    return load8(src_pos);
}

fn skip() i64 {
    while (is_space(cur()) == 1) { src_pos += 1; }
    return 0;
}

fn number() i64 {
    var val: i64 = 0;
    while (is_digit(cur()) == 1) {
        val = val * 10 + (cur() - '0');
        src_pos += 1;
    }
    skip();
    return val;
}

fn read_name() i64 {
    var start: i64 = src_pos;
    while (is_alnum(cur()) == 1) { src_pos += 1; }
    var len: i64 = src_pos - start;
    skip();
    return start * 65536 + len;
}

cur() - '0' instead of cur() - 48. src_pos += 1 instead of src_pos = src_pos + 1. The scanner reads like the Zig version.

self-streq-emit

String comparison and emit:

fn streq_mem(a: i64, b: i64) i64 {
    if (len(a) != len(b)) { return 0; }
    for (0..len(a)) |i| {
        if (a[i] != b[i]) { return 0; }
    }
    return 1;
}

var out_pos: i64 = 0;

fn emit_byte(b: i64) i64 {
    store8(100000 + out_pos, b);
    out_pos += 1;
    return 0;
}

fn emit_s(s: i64) i64 {
    for (0..len(s)) |i| {
        emit_byte(s[i]);
    }
    return 0;
}

fn emit_num(n: i64) i64 {
    if (n < 0) { emit_byte('-'); emit_num(0 - n); return 0; }
    if (n > 9) { emit_num(n / 10); }
    emit_byte('0' + n % 10);
    return 0;
}

streq_mem uses a[i] slice indexing, len(a) for length, and for loops. No manual address arithmetic. emit_s iterates with for instead of while with a counter. emit_num uses '0' + n % 10 -- character literal and modulo, both earning their keep.

### The expression compiler

self-c-factor

c_factor -- numbers, variables, parentheses, string literals, character literals, function calls:

fn c_factor() i64 {
    if (cur() == '(') {
        src_pos += 1; skip();
        c_expression();
        if (cur() == ')') { src_pos += 1; skip(); }
        return 0;
    }
    if (cur() == '\'') {
        src_pos += 1;
        var ch: i64 = cur();
        src_pos += 1;
        if (cur() == '\'') { src_pos += 1; skip(); }
        emit_s("  i64.const "); emit_num(ch); emit_byte('\n');
        return 0;
    }
    if (cur() == '"') {
        src_pos += 1;
        var sstart: i64 = src_pos;
        while (cur() != '"') { src_pos += 1; }
        var slen: i64 = src_pos - sstart;
        src_pos += 1; skip();
        emit_s("  i64.const ");
        emit_num(sstart * 65536 + slen);
        emit_byte('\n');
        add_str(sstart, slen);
        return 0;
    }
    if (is_digit(cur()) == 1) {
        emit_s("  i64.const "); emit_num(number()); emit_byte('\n');
        return 0;
    }
    if (is_letter(cur()) == 1) {
        var name: i64 = read_name();
        if (streq_mem(name, "true") == 1) { emit_s("  i64.const 1\n"); return 0; }
        if (streq_mem(name, "false") == 1) { emit_s("  i64.const 0\n"); return 0; }
        if (cur() == '(') {
            c_call(name);
            return 0;
        }
        if (is_global(name) == 1) { emit_s("  global.get $"); }
        else { emit_s("  local.get $"); }
        emit_s(name); emit_byte('\n');
        return 0;
    }
    return 0;
}

Character literals ('(' instead of 40), true/false keywords, += on src_pos. The function handles every kind of atomic expression.

self-c-term-expression

c_term, c_add_sub, c_expression:

fn c_term() i64 {
    c_factor();
    while (cur() == '*' or cur() == '/' or cur() == '%') {
        var op: i64 = cur();
        src_pos += 1; skip();
        c_factor();
        if (op == '*') { emit_s("  i64.mul\n"); }
        if (op == '/') { emit_s("  i64.div_s\n"); }
        if (op == '%') { emit_s("  i64.rem_s\n"); }
    }
    return 0;
}

fn c_add_sub() i64 {
    c_term();
    while (cur() == '+' or cur() == '-') {
        var op: i64 = cur();
        src_pos += 1; skip();
        c_term();
        if (op == '+') { emit_s("  i64.add\n"); }
        if (op == '-') { emit_s("  i64.sub\n"); }
    }
    return 0;
}

fn c_comparison() i64 {
    c_add_sub();
    if (cur() == '>' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.ge_s\n");
    }
    if (cur() == '<' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.le_s\n");
    }
    if (cur() == '>') { src_pos += 1; skip(); c_add_sub(); emit_s("  i64.gt_s\n"); }
    if (cur() == '<') { src_pos += 1; skip(); c_add_sub(); emit_s("  i64.lt_s\n"); }
    if (cur() == '=' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.eq\n");
    }
    if (cur() == '!' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.ne\n");
    }
    return 0;
}

fn c_and_expr() i64 {
    c_comparison();
    while (is_letter(cur()) == 1) {
        var s: i64 = src_pos;
        var w: i64 = read_name();
        if (streq_mem(w, "and") == 1) {
            emit_s("  i64.const 0\n  i64.ne\n");
            c_comparison();
            emit_s("  i64.const 0\n  i64.ne\n  i64.and\n");
        } else { src_pos = s; break; }
    }
    return 0;
}

fn c_or_expr() i64 {
    c_and_expr();
    while (is_letter(cur()) == 1) {
        var s: i64 = src_pos;
        var w: i64 = read_name();
        if (streq_mem(w, "or") == 1) {
            emit_s("  i64.const 0\n  i64.ne\n");
            c_and_expr();
            emit_s("  i64.const 0\n  i64.ne\n  i64.or\n");
        } else { src_pos = s; break; }
    }
    return 0;
}

fn c_expression() i64 {
    c_or_expr();
    return 0;
}

cur() == '*' or cur() == '/' -- inline condition using or, no helper function needed. The full precedence chain: expression -> or_expr -> and_expr -> comparison -> add_sub -> term -> factor. The self-hosted compiler handles every operator the Zig compiler handles.

### The statement compiler

self-c-stmt

c_stmt -- returns 1 if it left a value on the stack, 0 if not:

fn c_stmt() i64 {
    if (is_letter(cur()) == 1) {
        var save: i64 = src_pos;
        var word: i64 = read_name();

        if (streq_mem(word, "var") == 1 or streq_mem(word, "const") == 1) {
            var vname: i64 = read_name();
            if (cur() == ':') { src_pos += 1; skip(); read_name(); }
            if (cur() == '=') { src_pos += 1; skip(); }
            c_expression();
            if (is_global(vname) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(vname); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        if (streq_mem(word, "return") == 1) {
            c_expression();
            emit_s("  return\n");
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        if (streq_mem(word, "if") == 1) {
            if (cur() == '(') { src_pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { src_pos += 1; skip(); }
            emit_s("  if (result i64)\n");
            c_block();
            if (is_letter(cur()) == 1) {
                var s2: i64 = src_pos;
                var w2: i64 = read_name();
                if (streq_mem(w2, "else") == 1) {
                    emit_s("  else\n");
                    c_block();
                } else {
                    src_pos = s2;
                    emit_s("  else\n  i64.const 0\n");
                }
            } else {
                emit_s("  else\n  i64.const 0\n");
            }
            emit_s("  end\n");
            return 1;
        }

        if (streq_mem(word, "while") == 1) {
            emit_s("  block\n  loop\n");
            if (cur() == '(') { src_pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { src_pos += 1; skip(); }
            emit_s("  i64.eqz\n  br_if 1\n");
            c_block();
            emit_s("  drop\n  br 0\n  end\n  end\n");
            return 0;
        }

        if (streq_mem(word, "for") == 1) {
            if (cur() == '(') { src_pos += 1; skip(); }
            c_expression();
            if (cur() == '.') { src_pos += 1; }
            if (cur() == '.') { src_pos += 1; }
            skip();
            c_expression();
            emit_s("  local.set $__for_end\n");
            if (cur() == ')') { src_pos += 1; skip(); }
            if (cur() == '|') { src_pos += 1; skip(); }
            var loop_var: i64 = read_name();
            if (cur() == '|') { src_pos += 1; skip(); }
            emit_s("  local.set $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  block\n  loop\n");
            emit_s("  local.get $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  local.get $__for_end\n  i64.ge_s\n  br_if 1\n");
            c_block();
            emit_s("  drop\n");
            emit_s("  local.get $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  i64.const 1\n  i64.add\n");
            emit_s("  local.set $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  br 0\n  end\n  end\n");
            return 0;
        }

        if (streq_mem(word, "break") == 1) {
            if (cur() == ';') { src_pos += 1; skip(); }
            emit_s("  br 1\n");
            return 0;
        }

        if (streq_mem(word, "fn") == 1) {
            c_fn();
            return 0;
        }

        // compound assignment: name += expr
        if (cur() == '+' and load8(src_pos + 1) == '=') {
            src_pos += 2; skip();
            if (is_global(word) == 1) { emit_s("  global.get $"); }
            else { emit_s("  local.get $"); }
            emit_s(word); emit_byte('\n');
            c_expression();
            emit_s("  i64.add\n");
            if (is_global(word) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(word); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }
        if (cur() == '-' and load8(src_pos + 1) == '=') {
            src_pos += 2; skip();
            if (is_global(word) == 1) { emit_s("  global.get $"); }
            else { emit_s("  local.get $"); }
            emit_s(word); emit_byte('\n');
            c_expression();
            emit_s("  i64.sub\n");
            if (is_global(word) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(word); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        // assignment: name = expr (not ==)
        if (cur() == '=' and load8(src_pos + 1) != '=') {
            src_pos += 1; skip();
            c_expression();
            if (is_global(word) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(word); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        src_pos = save;
    }
    c_expression();
    if (cur() == ';') { src_pos += 1; skip(); }
    return 1;
}

Every keyword check uses streq_mem with string literals. var and const are handled together with or. for parses the (start..end) |name| syntax. break emits br 1. Compound assignment (+=, -=) emits get-compute-set. The = vs == check uses load8(src_pos + 1) != '='.

self-c-block

c_block:

fn c_block() i64 {
    if (cur() == '{') { src_pos += 1; skip(); }
    var last_had_value: i64 = 0;
    while (cur() != '}') {
        if (cur() == 0) { return 0; }
        if (last_had_value == 1) { emit_s("  drop\n"); }
        last_had_value = c_stmt();
    }
    if (last_had_value == 0) { emit_s("  i64.const 0\n"); }
    if (cur() == '}') { src_pos += 1; skip(); }
    return 0;
}

self-c-call-fn

c_call and c_fn:

fn c_call(name: i64) i64 {
    src_pos += 1; skip();
    while (cur() != ')') {
        if (cur() == 0) { return 0; }
        c_expression();
        if (cur() == ',') { src_pos += 1; skip(); }
    }
    if (cur() == ')') { src_pos += 1; skip(); }
    emit_s("  call $"); emit_s(name); emit_byte('\n');
    return 0;
}

fn c_fn() i64 {
    var fname: i64 = read_name();
    emit_s("(func $"); emit_s(fname);

    if (cur() == '(') { src_pos += 1; skip(); }
    while (cur() != ')') {
        if (cur() == 0) { return 0; }
        var pname: i64 = read_name();
        emit_s(" (param $"); emit_s(pname);
        if (cur() == ':') {
            src_pos += 1; skip();
            var ptype: i64 = read_name();
            emit_s(" "); emit_s(ptype);
        }
        emit_s(")");
        if (cur() == ',') { src_pos += 1; skip(); }
    }
    if (cur() == ')') { src_pos += 1; skip(); }

    if (is_letter(cur()) == 1) {
        var rtype: i64 = read_name();
        emit_s(" (result "); emit_s(rtype); emit_s(")");
    }
    emit_byte('\n');

    var body_start: i64 = src_pos;
    scan_locals();
    src_pos = body_start;
    c_block();
    emit_s("  drop\n  i64.const 0\n)\n");
    return 0;
}

self-helpers

Helper functions:

fn skip_past_block() i64 {
    while (cur() != '{') {
        if (cur() == 0) { return 0; }
        src_pos += 1;
    }
    src_pos += 1; skip();
    var depth: i64 = 1;
    while (depth > 0) {
        if (cur() == 0) { return 0; }
        if (cur() == '{') { depth += 1; }
        if (cur() == '}') { depth -= 1; }
        if (depth > 0) { src_pos += 1; }
    }
    if (cur() == '}') { src_pos += 1; skip(); }
    return 0;
}

fn skip_stmt() i64 {
    while (cur() != ';') {
        if (cur() == 0) { return 0; }
        if (cur() == '{') { skip_past_block(); return 0; }
        src_pos += 1;
    }
    src_pos += 1; skip();
    return 0;
}

fn scan_locals() i64 {
    var save: i64 = src_pos;
    if (cur() == '{') { src_pos += 1; skip(); }
    var depth: i64 = 0;
    while (cur() != 0) {
        if (cur() == '{') { depth += 1; }
        if (cur() == '}') {
            if (depth == 0) {
                emit_s("  (local $__tmp i64)\n");
                emit_s("  (local $__for_end i64)\n");
                src_pos = save;
                return 0;
            }
            depth -= 1;
        }
        if (is_letter(cur()) == 1) {
            var w: i64 = read_name();
            if (streq_mem(w, "var") == 1 or streq_mem(w, "const") == 1) {
                var vname: i64 = read_name();
                if (cur() == ':') { src_pos += 1; skip(); read_name(); }
                if (is_global(vname) == 0) {
                    emit_s("  (local $"); emit_s(vname); emit_s(" i64)\n");
                }
            }
            if (streq_mem(w, "for") == 1) {
                // skip to |name| and register the capture variable
                while (cur() != '|' and cur() != 0) { src_pos += 1; }
                if (cur() == '|') { src_pos += 1; skip(); }
                var capname: i64 = read_name();
                if (cur() == '|') { src_pos += 1; skip(); }
                emit_s("  (local $"); emit_s(capname); emit_s(" i64)\n");
            }
        } else {
            src_pos += 1;
        }
    }
    emit_s("  (local $__tmp i64)\n");
    emit_s("  (local $__for_end i64)\n");
    src_pos = save;
    return 0;
}

scan_locals now also detects for loops and registers their capture variables as locals. It emits $__tmp and $__for_end for every function.

self-global-tracking

Global variable tracking using arrays:

var global_names: [64]i64 = undefined;
var g_count: i64 = 0;

fn add_global(name: i64) i64 {
    global_names[g_count] = name;
    g_count += 1;
    return 0;
}

fn is_global(name: i64) i64 {
    for (0..g_count) |i| {
        if (streq_mem(global_names[i], name) == 1) { return 1; }
    }
    return 0;
}

global_names[g_count] = name -- array write. global_names[i] -- array read. for (0..g_count) |i| -- range loop. No manual store64/load64 address arithmetic.

self-string-table

String data table using arrays:

var str_addrs: [256]i64 = undefined;
var str_lens: [256]i64 = undefined;
var str_count: i64 = 0;

fn add_str(addr: i64, slen: i64) i64 {
    str_addrs[str_count] = addr;
    str_lens[str_count] = slen;
    str_count += 1;
    return 0;
}

fn emit_string_data() i64 {
    for (0..str_count) |i| {
        emit_s("(data (i32.const ");
        emit_num(str_addrs[i]);
        emit_s(") \"");
        for (0..str_lens[i]) |j| {
            emit_byte(load8(str_addrs[i] + j));
        }
        emit_s("\")\n");
    }
    return 0;
}

Two parallel arrays instead of packed 16-byte records. str_addrs[i] reads cleanly. Nested for loops iterate the table and the bytes.

self-c-program

c_program -- the top-level compiler:

fn c_program() i64 {
    emit_s("(module\n");
    emit_s("(memory (export \"memory\") 10)\n");

    // scan for top-level globals
    src_pos = 0; skip();
    var depth: i64 = 0;
    while (src_pos < src_len) {
        if (load8(src_pos) == '{') { depth += 1; src_pos += 1; }
        if (load8(src_pos) == '}') { depth -= 1; src_pos += 1; }
        if (depth == 0 and is_letter(cur()) == 1) {
            var s: i64 = src_pos;
            var w: i64 = read_name();
            if (streq_mem(w, "var") == 1 or streq_mem(w, "const") == 1) {
                var vn: i64 = read_name();
                if (cur() == ':') { src_pos += 1; skip(); read_name(); }
                add_global(vn);
            } else {
                if (streq_mem(w, "fn") == 1) { skip_past_block(); }
                else { src_pos = s; src_pos += 1; }
            }
        } else { src_pos += 1; }
    }

    // emit global declarations
    for (0..g_count) |gi| {
        emit_s("(global $"); emit_s(global_names[gi]);
        emit_s(" (mut i64) (i64.const 0))\n");
    }

    // compile function definitions
    src_pos = 0; skip();
    while (cur() != 0) {
        if (is_letter(cur()) == 1) {
            var s: i64 = src_pos;
            var w: i64 = read_name();
            if (streq_mem(w, "fn") == 1) { c_fn(); }
            else { src_pos = s; skip_stmt(); }
        } else { src_pos += 1; }
    }

    // compile top-level code into main
    emit_s("(func (export \"main\") (result i64)\n");
    emit_s("  (local $__tmp i64)\n");
    emit_s("  (local $__for_end i64)\n");
    src_pos = 0; skip();
    var last_v: i64 = 0;
    while (cur() != 0) {
        if (is_letter(cur()) == 1) {
            var s: i64 = src_pos;
            var w: i64 = read_name();
            if (streq_mem(w, "fn") == 1) {
                skip_past_block();
            } else {
                src_pos = s;
                if (last_v == 1) { emit_s("  drop\n"); }
                last_v = c_stmt();
            }
        } else {
            if (last_v == 1) { emit_s("  drop\n"); }
            last_v = c_stmt();
        }
    }
    if (last_v == 0) { emit_s("  i64.const 0\n"); }
    emit_s(")\n");
    emit_string_data();
    emit_s(")\n");
    return 0;
}

self-entry-point

The entry point:

fn main() i64 {
    src_len = read_input();
    c_program();
    for (0..out_pos) |i| {
        print_char(load8(100000 + i));
    }
    return 0;
}
main()

read_input loads source code into memory at address 0. c_program parses it and writes WAT into memory at 100000. The for loop prints every byte to stdout.

self-feature-coverage

Count the features. Every Part 10 addition appears in the self-hosted compiler:

| Feature | Where it appears |
|---------|-----------------|
| >= <= | c_comparison: i64.ge_s, i64.le_s |
| and or | c_and_expr, c_or_expr, is_space, c_term conditions |
| 'a' | c_factor char literal handler, emit_num's '0' |
| % | emit_num: n % 10, c_term: i64.rem_s |
| += -= | c_stmt compound assignment, every counter increment |
| break | c_stmt handler, c_and_expr/c_or_expr backtrack |
| for (0..n) \|i\| | streq_mem, emit_s, emit_string_data, c_program, main |
| true false | c_factor keyword check |
| var arr: [N]i64 | global_names, str_addrs, str_lens |
| arr[i] | global_names[i], str_addrs[i], str_lens[i] |
| s[i] len(s) | streq_mem: a[i] != b[i], len(a) |

The self-hosted compiler is the proof that Part 10 works. Every feature was added because this chapter uses it.

## Part 12: The Snake Eats Its Tail

assemble-compiler

Assemble the complete compiler. Create a file called compiler.lang containing, in order:

1. All the helper functions (is_space, is_letter, is_digit, is_alnum)
2. All the scanner functions (cur, skip, number, read_name, streq_mem)
3. All the emitter functions (emit_byte, emit_s, emit_num)
4. All the compiler functions (c_factor, c_term, c_add_sub, c_expression, c_stmt, c_block, c_fn, c_call, scan_locals, skip_past_block, skip_stmt, c_program, add_str, emit_string_data, is_global, add_global)
5. The global variables (src_pos, src_len, out_pos)
6. The main function

Every function in this file uses only features our language supports: i64 integers, if/else with braces, while, functions with typed parameters, load8/store8, and string literals. Nothing else.

test-self-three-plus-five

Test the self-hosted compiler on a simple expression. Feed it "3 + 5" as input and check the WAT output.

We can do this entirely in the Zig test harness:

fn test_self_hosted(compiler_src: []const u8, input: []const u8) void {
    // Step 1: Compile the compiler from our language to WAT
    c_program(compiler_src);
    const compiler_wat: []u8 = out[0..out_len];

    // Step 2: Run the compiler WAT in the VM, with 'input' as its stdin
    // Load input into vm_memory at address 0
    for (input, 0..) |c, i| { vm_memory[i] = c; }
    // Set up read_input to return the length
    const result_wat: []const u8 = vm_run_compiler(compiler_wat, input.len);

    // Step 3: The VM's output (in vm_memory at 100000+) is the compiled WAT of 'input'
    print("Self-hosted output:\n{s}\n", .{result_wat});
}

The output should be:

(module
(memory (export "memory") 1)
(func (export "main") (result i64)
  i64.const 3
  i64.const 5
  i64.add
)
)

If it is: the self-hosted compiler, running as WAT inside our VM, produced correct WAT for 3 + 5. The compiler compiled.

test-self-fibonacci

Test the self-hosted compiler on fib(10). Feed it the Fibonacci source. The output should match what our Zig compiler produces.

bootstrap-test

The bootstrap.

Here's the moment everything has been building toward. Four commands:

# Step 1: Compile the compiler with our Zig implementation
zig build-exe minizig.zig && ./minizig < compiler.lang > compiler.wat

# Step 2: Assemble to WASM
wat2wasm compiler.wat -o compiler.wasm

# Step 3: Run the WASM compiler on its own source
node run.js compiler.wasm < compiler.lang > compiler2.wat

# Step 4: Compare
diff compiler.wat compiler2.wat

run.js is a small Node.js script that loads the WASM module with I/O imports:

const fs = require('fs');
const wasm = fs.readFileSync(process.argv[2]);
const input = fs.readFileSync('/dev/stdin');

WebAssembly.instantiate(wasm, {
    env: {
        read_input: () => {
            for (let i = 0; i < input.length; i++)
                mem[i] = input[i];
            return input.length;
        },
        print_char: (c) => process.stdout.write(Buffer.from([Number(c)])),
    }
}).then(({ instance }) => {
    mem = new Uint8Array(instance.exports.memory.buffer);
    instance.exports.main();
});

the-empty-diff

Run it.

$ diff compiler.wat compiler2.wat
$

Empty. No output. The diff found nothing.

The compiler -- compiled to WebAssembly by the Zig host -- and the same compiler -- compiled to WebAssembly by itself running as WebAssembly -- produced identical output when given the same input.

The snake has eaten its tail.

### What just happened

Let's trace the full path:

1. We wrote eval("3+5") and got 8. (Problem 4)
2. We added a cursor, multi-digit numbers, whitespace, and precedence. (Problems 6–12)
3. We added typed variables, if/else, while, functions. (Problems 15–27)
4. We wrote a compiler that emits WAT instead of computing values. Same parser, different actions. (Problems 30–44)
5. We built a VM that executes WAT, proving the compiler correct through agreement testing. (Problems 48–62)
6. We added memory and strings so the language could manipulate bytes. (Problems 65–75)
7. We wrote compiler primitives in our own language and showed they worked. (Problems 76–79)
8. We assembled those primitives into a complete compiler -- in our language. (Problems 84–92)
9. We compiled the compiler with the Zig host, ran it as WASM, fed it its own source, and got identical output. (Problems 93–97)

Ninety-seven problems. One file. One language. A compiler that compiles itself.

### Why this matters

Self-hosting isn't just a party trick. It's a proof of completeness.

If the compiler can compile itself, it means the language is powerful enough to express its own implementation. Every feature the compiler uses -- arithmetic, string comparison, memory access, recursive descent, output buffering -- is a feature the language provides. There are no hidden dependencies, no magic behind the curtain. The language stands on its own feet.

It's also a practical tool. Once you have a self-hosting compiler, you can improve the language using the language itself. Add a feature to the compiler source, compile it with the old compiler, and now you have a new compiler that supports the new feature. Bootstrap again and the new compiler compiles itself with the new feature. The language grows itself.

c4 did this in 525 lines of C. We did it in a few hundred lines of our language, targeting WebAssembly instead of custom bytecode. The approach is different -- c4 uses single-pass compilation with no tree, just like we do -- but the principle is the same. The snake must be just long enough to eat itself. Not longer. Every unnecessary feature is code you have to compile, which is code you have to write, which is code the compiler has to handle. Minimalism isn't just aesthetic. It's engineering.

# Appendices

## Appendix A: A Simple Allocator

Everything in our language lives at fixed addresses. Arrays start at 300000. The output buffer starts at 100000. The compiler's string table starts at 200000. We chose these addresses by hand and made sure they don't overlap.

That works for a compiler. It doesn't work for a program that needs to create data structures whose size depends on the input. A text editor doesn't know how big the file is until it opens it. A web server doesn't know how many connections it'll handle. These programs need to request memory at runtime -- and eventually give it back.

That's what an allocator does. alloc(size) returns a pointer to size bytes of usable memory. free(ptr) returns those bytes to the pool. Between the two, a program can build linked lists, hash maps, trees, resizable arrays -- any data structure that grows and shrinks.

We're going to build one. No new language features. Everything we need -- load64, store64, pointer arithmetic, linked lists via addresses -- already exists.

### The bump allocator

The simplest possible allocator: a pointer that moves forward.

bump-allocator

Write a bump allocator:

var heap_start: i64 = 500000;
var heap_ptr: i64 = 500000;
var heap_limit: i64 = 900000;

fn alloc(size: i64) i64 {
    var ptr: i64 = heap_ptr;
    heap_ptr += (size + 7) / 8 * 8;
    if (heap_ptr > heap_limit) { return 0; }
    return ptr;
}

That's it. alloc(100) returns 500000 and advances heap_ptr to 500104 (rounded up to a multiple of 8 for alignment). The next alloc(50) returns 500104 and advances to 500160. Each call gets a fresh, non-overlapping region.

The (size + 7) / 8 * 8 rounds up to 8-byte alignment. WAT's i64.load and i64.store require aligned addresses. Without rounding, alloc(3) followed by alloc(8) would place an i64 at a misaligned address.

Test:

    check(
        \\var heap_ptr: i64 = 500000;
        \\var heap_limit: i64 = 900000;
        \\fn alloc(size: i64) i64 {
        \\    var ptr: i64 = heap_ptr;
        \\    heap_ptr += (size + 7) / 8 * 8;
        \\    if (heap_ptr > heap_limit) { return 0; }
        \\    return ptr;
        \\}
        \\var a: i64 = alloc(100);
        \\var b: i64 = alloc(200);
        \\b - a
    , 104);  // 100 rounded up to 104 (next multiple of 8)

The bump allocator is fast (one addition per call), simple (six lines), and useful for programs that only allocate and never free -- compilers, for instance. Our self-hosted compiler could use this instead of pre-assigned addresses.

But it can't free. Memory only grows. For long-running programs, we need something better.

### Block headers

To free memory, we need to know how big each allocation was. We store this information in a header -- 8 bytes immediately before the returned pointer.

  Header (8 bytes)     Payload (size bytes)
  ┌────────────────┬────────────────────────────┐
  │  block_size    │  usable memory             │
  └────────────────┴────────────────────────────┘
  ^                ^
  block address    pointer returned by alloc()

alloc(size) allocates size + 8 bytes, writes the total block size into the header, and returns a pointer to the payload. free(ptr) reads the header at ptr - 8 to learn the block size.

block-headers

Add headers to the bump allocator:

fn alloc(size: i64) i64 {
    var total: i64 = (size + 7) / 8 * 8 + 8;
    var block: i64 = heap_ptr;
    heap_ptr += total;
    if (heap_ptr > heap_limit) { return 0; }
    store64(block, total);
    return block + 8;
}

alloc(100) allocates 112 bytes (100 rounded to 104, plus 8 for the header). It writes 112 at the block address and returns block + 8.

Given a pointer ptr, the block size is load64(ptr - 8).

Test:

    check(
        \\var heap_ptr: i64 = 500000;
        \\var heap_limit: i64 = 900000;
        \\fn alloc(size: i64) i64 {
        \\    var total: i64 = (size + 7) / 8 * 8 + 8;
        \\    var block: i64 = heap_ptr;
        \\    heap_ptr += total;
        \\    if (heap_ptr > heap_limit) { return 0; }
        \\    store64(block, total);
        \\    return block + 8;
        \\}
        \\var p: i64 = alloc(100);
        \\load64(p - 8)
    , 112);  // 100 -> 104 aligned + 8 header = 112

### The free list

When we free a block, we add it to a free list -- a linked list of available blocks. Each free block stores a pointer to the next free block in its payload area (which isn't being used):

  Free block:
  ┌────────────────┬────────────────┬─────────────────┐
  │  block_size    │  next_free     │  (unused)        │
  └────────────────┴────────────────┴─────────────────┘

free_list is a global that points to the first free block (or 0 if the list is empty).

free-list

Write free():

var free_list: i64 = 0;

fn free(ptr: i64) i64 {
    var block: i64 = ptr - 8;
    store64(block + 8, free_list);
    free_list = block;
    return 0;
}

Two lines of real work. We write the current list head into the freed block's payload, then make the freed block the new head. It's a linked-list prepend using raw memory.

alloc-with-reuse

Update alloc() to search the free list before bump-allocating:

fn alloc(size: i64) i64 {
    var total: i64 = (size + 7) / 8 * 8 + 8;

    // Search the free list for a large enough block
    var prev: i64 = 0;
    var curr: i64 = free_list;
    while (curr != 0) {
        var block_size: i64 = load64(curr);
        if (block_size >= total) {
            // Remove from free list
            var next: i64 = load64(curr + 8);
            if (prev == 0) {
                free_list = next;
            } else {
                store64(prev + 8, next);
            }
            return curr + 8;
        }
        prev = curr;
        curr = load64(curr + 8);
    }

    // No free block found -- bump allocate
    var block: i64 = heap_ptr;
    heap_ptr += total;
    if (heap_ptr > heap_limit) { return 0; }
    store64(block, total);
    return block + 8;
}

First-fit strategy: walk the free list, take the first block that's large enough. If nothing fits, fall back to bump allocation.

Test -- alloc, free, alloc again reuses the freed block:

    check(
        \\var heap_ptr: i64 = 500000;
        \\var heap_limit: i64 = 900000;
        \\var free_list: i64 = 0;
        \\fn alloc(size: i64) i64 {
        \\    var total: i64 = (size + 7) / 8 * 8 + 8;
        \\    var prev: i64 = 0;
        \\    var curr: i64 = free_list;
        \\    while (curr != 0) {
        \\        var block_size: i64 = load64(curr);
        \\        if (block_size >= total) {
        \\            var next: i64 = load64(curr + 8);
        \\            if (prev == 0) { free_list = next; }
        \\            else { store64(prev + 8, next); }
        \\            return curr + 8;
        \\        }
        \\        prev = curr;
        \\        curr = load64(curr + 8);
        \\    }
        \\    var block: i64 = heap_ptr;
        \\    heap_ptr += total;
        \\    if (heap_ptr > heap_limit) { return 0; }
        \\    store64(block, total);
        \\    return block + 8;
        \\}
        \\fn free(ptr: i64) i64 {
        \\    var block: i64 = ptr - 8;
        \\    store64(block + 8, free_list);
        \\    free_list = block;
        \\    return 0;
        \\}
        \\var a: i64 = alloc(100);
        \\var b: i64 = alloc(200);
        \\free(a);
        \\var c: i64 = alloc(50);
        \\c == a
    , 1);  // c reuses a's freed block

a was freed and its 112-byte block went onto the free list. When we allocate 50 bytes (total 56 with header), the free list has a 112-byte block -- large enough. We reuse it. c == a confirms they got the same address.

### A dynamic array

Now let's use the allocator to build something useful: a growable array. Fixed-size arrays require knowing the size at declaration time. A dynamic array starts small and grows as needed -- like Zig's ArrayList, Python's list, or C++'s vector.

dynamic-array-design

Design a dynamic array using three values: ptr (pointer to the data), len (number of elements), cap (capacity -- how many elements fit before growing). Pack these into three consecutive i64 values in memory:

fn da_new() i64 {
    var da: i64 = alloc(24);
    store64(da, 0);       // ptr = null
    store64(da + 8, 0);   // len = 0
    store64(da + 16, 0);  // cap = 0
    return da;
}

fn da_len(da: i64) i64 { return load64(da + 8); }

fn da_get(da: i64, i: i64) i64 {
    return load64(load64(da) + i * 8);
}

dynamic-array-push

Write da_push -- append an element, growing if necessary:

fn da_push(da: i64, val: i64) i64 {
    var len: i64 = load64(da + 8);
    var cap: i64 = load64(da + 16);
    if (len >= cap) {
        // Grow: double capacity (minimum 4)
        var new_cap: i64 = cap * 2;
        if (new_cap < 4) { new_cap = 4; }
        var new_ptr: i64 = alloc(new_cap * 8);
        var old_ptr: i64 = load64(da);
        // Copy old elements
        for (0..len) |i| {
            store64(new_ptr + i * 8, load64(old_ptr + i * 8));
        }
        // Free old buffer (if any)
        if (old_ptr != 0) { free(old_ptr); }
        store64(da, new_ptr);
        store64(da + 16, new_cap);
    }
    store64(load64(da) + len * 8, val);
    store64(da + 8, len + 1);
    return 0;
}

This is the classic amortized-doubling strategy. Most pushes are O(1) -- just store and increment. Occasionally the array is full and we allocate a new buffer, copy everything, and free the old one. The amortized cost per push is O(1).

Test:

    // (include alloc, free, da_new, da_push, da_get, da_len from above)
    check(
        \\ ... allocator and dynamic array functions ...
        \\var arr: i64 = da_new();
        \\da_push(arr, 10);
        \\da_push(arr, 20);
        \\da_push(arr, 30);
        \\da_push(arr, 40);
        \\da_push(arr, 50);
        \\da_get(arr, 0) + da_get(arr, 4)
    , 60);  // 10 + 50

Five elements pushed. The array started at capacity 0, grew to 4 on the first push, then grew to 8 on the fifth push. da_get(arr, 0) returns 10. da_get(arr, 4) returns 50.

dynamic-array-pop

Write da_pop -- remove and return the last element:

fn da_pop(da: i64) i64 {
    var len: i64 = load64(da + 8);
    if (len == 0) { return 0; }
    len -= 1;
    store64(da + 8, len);
    return load64(load64(da) + len * 8);
}

Test:

    // push 10, 20, 30; pop gives 30, 20, 10
    check(
        \\ ... allocator and dynamic array functions ...
        \\var arr: i64 = da_new();
        \\da_push(arr, 10);
        \\da_push(arr, 20);
        \\da_push(arr, 30);
        \\var a: i64 = da_pop(arr);
        \\var b: i64 = da_pop(arr);
        \\a * 100 + b
    , 3020);  // 30 * 100 + 20

### Coalescing

Our allocator has a problem: fragmentation. If you allocate and free many small blocks, the free list fills up with tiny pieces that can't satisfy larger requests -- even though the total free memory is plenty.

The fix is coalescing: when freeing a block, check if the block immediately after it in memory is also free. If so, merge them into one larger block.

free-with-coalescing

Write free with forward coalescing:

fn free(ptr: i64) i64 {
    var block: i64 = ptr - 8;
    var size: i64 = load64(block);

    // Check if the next block in memory is on the free list
    var next_block: i64 = block + size;
    var prev: i64 = 0;
    var curr: i64 = free_list;
    while (curr != 0) {
        if (curr == next_block) {
            // Adjacent! Merge: absorb next block's size
            size += load64(curr);
            store64(block, size);
            // Remove next_block from free list
            var after: i64 = load64(curr + 8);
            if (prev == 0) { free_list = after; }
            else { store64(prev + 8, after); }
            break;
        }
        prev = curr;
        curr = load64(curr + 8);
    }

    // Add merged (or unmerged) block to free list
    store64(block + 8, free_list);
    free_list = block;
    return 0;
}

When we free block A, we check if block B (at address A + A's size) is on the free list. If it is, we remove B from the free list, add B's size to A, and put the merged block on the free list. One large block instead of two small ones.

Test -- allocate three blocks, free the middle one, free the first one, they should coalesce:

    check(
        \\ ... allocator with coalescing free ...
        \\var a: i64 = alloc(32);
        \\var b: i64 = alloc(32);
        \\var c: i64 = alloc(32);
        \\free(b);
        \\free(a);
        \\var d: i64 = alloc(72);
        \\d == a
    , 1);

a and b were adjacent in memory. Freeing b puts it on the free list. Freeing a checks if the block after a (which is b) is free -- it is. They merge into one 80-byte block (40 + 40). When we allocate 72 bytes (total 80 with header), the merged block fits perfectly.

design-why-not-builtin

Why did we build the allocator in our language instead of adding alloc as a builtin?

Because we could. The allocator uses load64, store64, pointer arithmetic, and a linked list -- all things the language already supports. Making it a builtin would hide the mechanism. Making it a library function proves the language is powerful enough for systems programming.

This is the same philosophy as streq -- we wrote string comparison instead of importing it. And emit_num -- we wrote number-to-string conversion instead of calling a runtime function. The language builds its own tools. Each tool validates the language.

In production, you'd want the allocator to be fast (a buddy allocator, slab allocator, or similar). What we built here is slow (linear free-list search) but correct, and correct is what matters for understanding. The reader now knows what malloc does -- because they wrote it.

What we built: a complete memory allocator in 50 lines of the language we created from scratch. Bump allocation, free-list allocation, coalescing. A dynamic array built on top of it. No new language features, no builtins, no magic -- just load64, store64, and arithmetic.

The language is no longer just a compiler's implementation language. It's a systems programming language. A simple one, with rough edges, but real. You can build data structures in it. You can manage memory in it. You can write programs that don't know their size at compile time.

Not bad for a language that started with eval("3+5").

## Appendix B: Floating Point

Until now, our language has had one type: i64. Numbers, booleans, packed strings, array addresses, function return values -- all integers. This made the compiler simple: emit i64.add, always. No type checking, no type inference, no decisions.

Now we break that rule. We add f64 -- 64-bit floating point. This is a bigger change than it sounds. With two types, the compiler has to answer a new question for every operation: which add? i64.add or f64.add? The variable tables need to track types. The WAT output emits (local $x f64) instead of (local $x i64). The VM's stack carries values that might be integers or might be floats.

One type was simplicity. Two types is expressiveness. Let's see what it costs.

### Float literals

A float literal is a number with a decimal point. 3.14 is float. 3 is integer. The dot is the signal.

float-literal-parsing

Add float literal parsing to the interpreter's factor(). When number() reads digits and hits a . followed by more digits, it's a float.

We store floats as their IEEE 754 bit pattern inside an i64, using Zig's @bitCast:

fn parse_float(whole: i64) i64 {
    var f: f64 = @floatFromInt(whole);
    pos += 1; // skip the '.'
    var divisor: f64 = 10.0;
    while (cur() >= '0' and cur() <= '9') {
        const digit: f64 = @floatFromInt(cur() - '0');
        f += digit / divisor;
        divisor *= 10.0;
        pos += 1;
    }
    skip();
    return @bitCast(f);
}

In factor(), after calling number(), check if the next character is .:

    // In the number/expression path:
    const val: i64 = number();
    if (cur() == '.' and pos + 1 < source.len and source[pos + 1] >= '0' and source[pos + 1] <= '9') {
        return parse_float(val);
    }

The returned i64 holds the bit pattern of a float. 3.14 returns the 64 bits of IEEE 754 3.14. To the variable storage system, it's just a number. The interpretation depends on the variable's declared type.

float-type-tracking

Track variable types. Add a type flag:

var var_types: [256]u8 = [_]u8{0} ** 256;    // 0 = i64, 1 = f64
var global_types: [256]u8 = [_]u8{0} ** 256;

When exec_stmt handles var x: f64 = 3.14;, set the type flag:

    // After reading the type name:
    const type_name: []const u8 = read_name();
    const is_float: bool = streq(type_name, "f64");
    // ... later, when storing:
    if (is_float) { var_types[num_vars] = 1; }

Write helpers to check types:

fn isFloat(name: []const u8) bool {
    for (0..num_vars) |i| {
        if (streq(var_names[i], name) and var_types[i] == 1) { return true; }
    }
    for (0..num_globals) |i| {
        if (streq(global_names[i], name) and global_types[i] == 1) { return true; }
    }
    return false;
}

### Float arithmetic

The interpreter needs to know which operations are float and which are integer. We track a "current expression type" -- integer by default, float if any operand is float.

float-arithmetic

Modify term() to handle float arithmetic:

var expr_is_float: bool = false;

fn factor() i64 {
    // ... existing code ...
    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (isFloat(name)) { expr_is_float = true; }
        // ...
    }
    // float literal detected -> expr_is_float = true
}

fn term() i64 {
    var val: i64 = factor();
    while (cur() == '*' or cur() == '/' or cur() == '%') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = factor();
        if (expr_is_float) {
            const a: f64 = @bitCast(val);
            const b: f64 = @bitCast(right);
            if (op == '*') { val = @bitCast(a * b); }
            else if (op == '/') { val = @bitCast(a / b); }
            else { val = @bitCast(@rem(a, b)); }
        } else {
            if (op == '*') { val = val * right; }
            else if (op == '/') { val = @divTrunc(val, right); }
            else { val = @rem(val, right); }
        }
    }
    return val;
}

Same structure as before. The if (expr_is_float) branch bitcasts to f64, operates, bitcasts back. The else branch does integer math as before.

Apply the same pattern to add_sub() and comparison().

Test -- we need a helper to compare float results since bitcast i64 values aren't human-readable:

fn check_float(input: []const u8, expected: f64) void {
    source = input; pos = 0; num_vars = 0; num_globals = 0;
    fn_count = 0; return_flag = false; in_function = false;
    save_depth = 0; expr_is_float = false; skip();
    const got: i64 = program();
    const got_f: f64 = @bitCast(got);
    if (@abs(got_f - expected) < 0.0001) {
        print("  ok {d:.4}\n", .{got_f});
    } else {
        print("  FAIL got={d:.4} want={d:.4}\n", .{ got_f, expected });
    }
}

    check_float("3.14", 3.14);
    check_float("2.0 + 3.0", 5.0);
    check_float("6.0 * 7.0", 42.0);
    check_float("22.0 / 7.0", 3.142857);
    check_float("var r: f64 = 2.5; r * r * 3.14159", 19.6349);

### Type conversion

Integer and float are separate worlds. 3 + 2.5 doesn't automatically work -- the 3 is an i64 and the 2.5 is an f64 bit pattern. We need explicit conversion.

float-conversion

Add conversion builtins:

    if (streq(name, "f64_from_i64")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const f: f64 = @floatFromInt(val);
        expr_is_float = true;
        return @bitCast(f);
    }
    if (streq(name, "i64_from_f64")) {
        pos += 1; skip();
        const was_float: bool = expr_is_float;
        expr_is_float = true;
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const f: f64 = @bitCast(val);
        expr_is_float = was_float;
        return @intFromFloat(f);
    }

f64_from_i64(42) converts integer 42 to float 42.0 (as a bitcast i64). i64_from_f64(3.7) truncates to integer 3.

Test:

    check_float("f64_from_i64(42)", 42.0);
    check("i64_from_f64(3.7)", 3);
    check("i64_from_f64(2.0 + 0.9)", 2);  // truncates, not rounds
    check_float("f64_from_i64(355) / f64_from_i64(113)", 3.14159);

The last test: 355/113 is a famous rational approximation of pi. With integer division it gives 3. With float division it gives 3.14159...

### Compiling floats

The compiler needs to emit the right WAT instructions based on types.

compiler-float-types

Track types in the compiler. Maintain a parallel array of type flags for globals and locals. When c_factor() encounters a float variable or float literal, set a flag. When c_term() / c_add_sub() emit arithmetic, choose based on the flag:

var c_expr_is_float: bool = false;

fn c_factor() void {
    // Float literal
    if (cur() >= '0' and cur() <= '9') {
        const val: i64 = number();
        if (cur() == '.' and pos + 1 < source.len and source[pos + 1] >= '0') {
            // Parse float, emit f64.const
            c_expr_is_float = true;
            emit_str("  f64.const ");
            emit_num(val);
            emit_byte('.');
            pos += 1;
            while (cur() >= '0' and cur() <= '9') {
                emit_byte(cur());
                pos += 1;
            }
            emit_byte('\n');
            skip();
            return;
        }
        emit_const(val);
        return;
    }
    // Variable reference
    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (c_is_float(name)) { c_expr_is_float = true; }
        // ... emit local.get/global.get as before
    }
}

fn c_term() void {
    c_factor();
    while (cur() == '*' or cur() == '/' or cur() == '%') {
        const op: u8 = cur();
        pos += 1; skip();
        c_factor();
        if (c_expr_is_float) {
            if (op == '*') { emit_op("f64.mul"); }
            else if (op == '/') { emit_op("f64.div"); }
        } else {
            if (op == '*') { emit_op("i64.mul"); }
            else if (op == '/') { emit_op("i64.div_s"); }
            else { emit_op("i64.rem_s"); }
        }
    }
}

Same pattern for c_add_sub() and c_comparison().

For local declarations, emit the right type:

    if (c_is_float(local_names[i])) {
        emit_str("  (local $"); emit_str(local_names[i]); emit_str(" f64)\n");
    } else {
        emit_str("  (local $"); emit_str(local_names[i]); emit_str(" i64)\n");
    }

For globals:

    emit_str("(global $"); emit_str(name);
    if (is_float) {
        emit_str(" (mut f64) (f64.const 0))\n");
    } else {
        emit_str(" (mut i64) (i64.const 0))\n");
    }

compiler-float-conversion

Add conversion builtins to the compiler:

    if (streq(name, "f64_from_i64")) {
        // compile argument (i64 expression)
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("f64.convert_i64_s");
        c_expr_is_float = true;
        return;
    }
    if (streq(name, "i64_from_f64")) {
        pos += 1; skip();
        const was: bool = c_expr_is_float;
        c_expr_is_float = true;
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("i64.trunc_f64_s");
        c_expr_is_float = was;
        return;
    }

WAT instructions: f64.convert_i64_s converts a signed i64 to f64. i64.trunc_f64_s truncates an f64 to a signed i64.

vm-float-instructions

Add float instructions to the VM. Each one bitcasts the i64 stack values to f64, operates, and bitcasts back:

    if (streq(line, "f64.add")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a + b));
        return;
    }
    if (streq(line, "f64.sub")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a - b));
        return;
    }
    if (streq(line, "f64.mul")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a * b));
        return;
    }
    if (streq(line, "f64.div")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a / b));
        return;
    }
    if (starts_with(line, "f64.const ")) {
        // Parse the float from the text
        const f: f64 = parse_f64_from_line(line, 10);
        vm_push(@bitCast(f));
        return;
    }
    if (streq(line, "f64.convert_i64_s")) {
        const i: i64 = vm_pop();
        const f: f64 = @floatFromInt(i);
        vm_push(@bitCast(f));
        return;
    }
    if (streq(line, "i64.trunc_f64_s")) {
        const f: f64 = @bitCast(vm_pop());
        const i: i64 = @intFromFloat(f);
        vm_push(i);
        return;
    }

Also add float comparisons: f64.gt, f64.lt, f64.ge, f64.le, f64.eq, f64.ne. These return i64 (1 or 0), not f64 -- a comparison's result is always boolean.

    if (streq(line, "f64.gt")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(if (a > b) @as(i64, 1) else 0);
        return;
    }
    // ... same pattern for lt, ge, le, eq, ne

float-agreement

Float agreement tests:

    // Direct float arithmetic
    check_both_float("2.0 + 3.0", 5.0);
    check_both_float("6.0 * 7.0", 42.0);
    check_both_float("22.0 / 7.0", 3.142857);

    // Float variables
    check_both_float("var x: f64 = 2.5; x * x", 6.25);

    // Conversion
    check_both_float("f64_from_i64(42)", 42.0);
    check_both("i64_from_f64(3.7)", 3);

    // Float in functions
    check_both_float(
        \\fn circle_area(r: f64) f64 {
        \\    return r * r * 3.14159;
        \\}
        \\circle_area(5.0)
    , 78.5397);

The function circle_area takes and returns f64. The compiler emits (func $circle_area (param $r f64) (result f64) -- real WAT float types. The VM executes float multiplication. Both engines agree.

### Practical example

newtons-sqrt

Newton's method for square root -- an algorithm that needs float arithmetic:

    check_both_float(
        \\fn sqrt(n: f64) f64 {
        \\    var guess: f64 = n / 2.0;
        \\    for (0..20) |i| {
        \\        guess = (guess + n / guess) / 2.0;
        \\    }
        \\    return guess;
        \\}
        \\sqrt(2.0)
    , 1.41421);

Twenty iterations of Newton's method, starting from n/2. Each iteration: guess = (guess + n/guess) / 2. Converges to sqrt(2) = 1.41421356... in about 5 iterations, but 20 is safe.

This is real numerical computation. The same algorithm, the same source code, running through both the interpreter (using @bitCast to pack/unpack floats) and the compiler (emitting f64.div, f64.add, f64.mul). Agreement proves both paths handle floating point correctly.

design-why-f64

Why f64 and not f32?

Three reasons.

First, WAT's f64 maps to Zig's f64, JavaScript's number, and C's double. It's the natural size. f32 would require explicit truncation in most host environments.

Second, f64 has 53 bits of significand -- enough to exactly represent every 32-bit integer. This means f64_from_i64(n) is lossless for values up to 2^53. With f32 (24 bits of significand), even f32_from_i64(16777217) loses precision. Since our integer type is 64-bit, f64 is the safe companion.

Third, f64.const 3.14 in WAT is 8 bytes, same as i64.const. No size savings from f32. The VM stack is already 8 bytes per slot. Everything aligns naturally at 64 bits.

trace-float-stack

Walk through the WAT for 2.0 + 3.0 * 4.0:

| Instruction | Stack (as f64) |
|-------------|----------------|
| f64.const 2.0 | [2.0] |
| f64.const 3.0 | [2.0, 3.0] |
| f64.const 4.0 | [2.0, 3.0, 4.0] |
| f64.mul | [2.0, 12.0] |
| f64.add | [14.0] |

Identical structure to the i64 trace from Problem 44. Precedence works the same way -- c_term grabs the 3.0 * 4.0 first. The only difference is the instruction prefix: f64 instead of i64. Same parser, same recursion, same stack machine. Two types, one architecture.

What we added: f64 as a second type. Float literals with decimal points. Float arithmetic, comparisons, and conversion. Type tracking in the compiler to emit the right WAT instructions. The VM handling floats via bitcast.

What we learned: adding a second type touches everything. The parser (literal detection), the interpreter (dispatch on type), the compiler (instruction selection, local/global declarations), the VM (new instruction handlers), and the test harness (float comparison). One type was simple. Two types is real.

But the architecture didn't change. The parser is still recursive descent. The compiler is still syntax-directed. The VM is still a line-by-line stack machine. Types added a dimension to every decision, but they didn't change the decisions. f64.add is just i64.add wearing different shoes -- the same metaphor we started with.

## Summary

| Part | What it adds | Problems |
|------|-------------|----------|
| 1 | Parse and eval "3+5" | 1–5 |
| 2 | Cursor, chains, multi-digit, whitespace, precedence | 6–14 |
| 3 | Typed variables, streq, assignment | 15–19 |
| 4 | If/else (mandatory braces), while | 20–23 |
| 5 | Typed functions, return, fib(10) | 24–29 |
| 6 | Compiler: parse -> WAT (syntax-directed, stack-clean) | 30–47 |
| 7 | VM: execute WAT, agreement testing | 48–64 |
| 8 | Memory (load8/store8/load64/store64), strings | 65–79 |
| 9 | Global variables | 80–93 |
| 10 | Completing the language (Zig subset) | 94–111 |
| 11 | The compiler in its own language | 112–126 |
| 12 | The bootstrap, the diff, the proof | 127–131 |
| A | A simple allocator | A1–A9 |
| B | Floating point (f64) | B1–B11 |

You started by making "3+5" give 8.

You ended with a compiler that compiles itself.

The proof is an empty diff -- two engines, same output, on every input, including the compiler's own source code.

Not bad for one file.