minizig — full book

A number

the-language

Go to ziglang.org and download the latest stable release of Zig. It's a single zip/tar file -- no installer, no setup wizard. Extract it somewhere and remember where you put it.

Now open a terminal. On Windows, press Win+R, type cmd, hit Enter. On macOS, open Terminal from Applications. On Linux, you know what to do.

In the terminal, navigate to the folder where you extracted Zig and check that it works:

zig version

If you get a version number, you're good. If you get "command not found", Zig isn't in your PATH yet -- look up how to add a folder to PATH on your operating system.

Now create a folder for your project. This is where you'll work for the rest of the book. Navigate into it:

mkdir minizig
cd minizig

Create a file called minizig.zig. Use any text editor -- Notepad, nano, anything. Find the "Hello World" example in the Zig documentation and paste it in. One thing: change !void to just void -- we don't use Zig's error handling in this book.

Compile it:

zig build-exe minizig.zig

This produces an executable called minizig (or minizig.exe on Windows). Run it:

./minizig

On Windows, type minizig.exe instead. You should see output.

That's it. You have a working Zig setup.

const std = @import("std");
const print = std.debug.print;

pub fn main() void {
    print("hello\n", .{});
}
getting-started

If you already know how to debug code in your editor of choice, skip this section.

We're not doing printf debugging in 2026. We hover the mouse over a variable and see its current value. We press F10 repeatedly and take a walk through the code as it runs. The debugger is the single best tool for understanding what your program is actually doing -- much better than guessing from a few print lines.

Install VS Code with three extensions: Zig Language, ZLS (Zig Language Server -- gives you autocomplete and error highlighting as you type), and CodeLLDB (lets you debug with breakpoints).

Create a .vscode folder in your project directory with this launch.json:

{
    "version": "0.2.0",
    "configurations": [{
        "type": "lldb",
        "request": "launch",
        "name": "Debug minizig",
        "program": "${workspaceFolder}/minizig.exe",
        "preLaunchTask": "build"
    }]
}

And this tasks.json:

{
    "version": "2.0.0",
    "tasks": [{
        "label": "build",
        "type": "shell",
        "command": "zig build-exe minizig.zig"
    }]
}

(On macOS/Linux, drop the .exe from the program path.)

Now replace your main with this:

pub fn main() void {
    const thing: i64 = 42;
    print("thing = {d}\n", .{thing});
}

thing is a const -- it can't change. Click in the gutter left of the print line -- a red dot appears. That's a breakpoint. Press F5. The program freezes at the dot. Hover over thing -- you see 42.

Now change const to var, and add a line before the print that sets thing to 99. Press F5 again. Step through with F10 and watch the value change in the debugger.

You'll live in this debugger for the rest of the book. Step through code with F10, step into functions with F11, hover over anything to inspect it. When something goes wrong (and it will), this is how you find out why.

const std = @import("std");
const print = std.debug.print;

pub fn main() void {
    var thing: i64 = 42;
    thing = 99;
    print("thing = {d}\n", .{thing});
}
hello-arithmetic

Make the program print 3 + 5 = 8.

Zig's print uses {d} as a placeholder for numbers. Values go in .{ a, b }.

pub fn main() !void {
    print("{d} + {d} = {d}\n", .{ 3, 5, 3 + 5 });
}

Not much of a language yet. But the cycle works: edit, save, zig build-exe minizig.zig && ./minizig. You'll do this hundreds of times. It takes about a second.

eval-single-digit

Write a function eval that takes a string and returns a number. For now, just read the first character and convert it to a digit.

A string in Zig has type []const u8 -- a slice of bytes. Characters are numbers: '3' is 51, '0' is 48. So '3' - '0' gives you the integer 3. To get an i64, cast: @as(i64, input[0] - '0').

    print("{d}\n", .{eval("7")});   // 7
fn eval(input: []const u8) i64 {
    return @as(i64, input[0] - '0');
}

One character. One digit. It's absurd. But it runs, and it's correct. We'll grow it.

check-harness

Write check(input, expected) -- evaluates the input, prints ok or FAIL. We'll use this for every test from here on.

Use this main:

pub fn main() void {
    check("7", 7);
    check("3", 9);
}

One should pass, one should fail.

fn check(input: []const u8, expected: i64) void {
    const got: i64 = eval(input);
    if (got == expected) {
        print("{s} = {d} ok\n", .{ input, got });
    } else {
        print("{s} = {d} FAIL want {d}\n", .{ input, got, expected });
    }
}

{s} prints a string.

Two style notes that will hold for the rest of the book:

Braces around every branch. Even single-statement ifs get { ... }. Zig allows you to omit them sometimes, but uniform syntax is easier to read and edit.

Every variable declaration carries its type. Even when Zig can infer it (const got: i64 = eval(input) instead of const got = eval(input)). We're being anal about "reading is more important than writing" even more than Andrew Kelley.

eval-two-digits-plus

Make eval handle "3+5": character 0 is the left digit, character 1 is the +, character 2 is the right digit. Add them.

    check("3+5", 8);
fn eval(input: []const u8) i64 {
    const a: i64 = @as(i64, input[0] - '0');
    const b: i64 = @as(i64, input[2] - '0');
    return a + b;
}

We're ignoring the operator. Hardcoded addition. Let's fix that.

four-operators-switch

The @as(i64, c - '0') pattern is starting to repeat. Add a tiny helper to cut the noise:

fn digit(c: u8) i64 {
    return @as(i64, c - '0');
}

Then digit(input[0]) instead of @as(i64, input[0] - '0').

Handle all four operators. Read input[1] and use a switch. In your main add this:

    check("3+5", 8);
    check("9-3", 6);
    check("4*7", 28);
    check("8/2", 4);

In Zig switch works like this:

const month: i64 = 2;
const month_string: []const u8 = switch (month) {
    0 => "Jan",
    1 => "Feb",
    else => "got bored typing this already",
};

For division, use @divTrunc(a, b) -- Zig insists you be explicit about truncation.

fn eval(input: []const u8) i64 {
    const a: i64 = digit(input[0]);
    const op: u8 = input[1];
    const b: i64 = digit(input[2]);
    return switch (op) {
        '+' => a + b,
        '-' => a - b,
        '*' => a * b,
        '/' => @divTrunc(a, b),
        else => 0,
    };
}

Four oks. We can evaluate any single-operator expression on two single digits. It's not much -- but think about what just happened. A string went in. A number came out. That's the whole game. Everything from here is just making the string more interesting.

A real parser

cursor-and-pos

We want to read a string one character at a time. To do that we need two things: somewhere to keep the string, and a cursor that points into it.

Make two globals: source: []const u8 (the string we're reading) and pos: usize (the current index -- usize is the type Zig uses for indices and lengths). Yes, globals are bad, somebody on the internet will be very upset. We're being lazy. We'll keep being lazy with this trick the whole way through. Don't do this in your day job.

Then write cur() that returns the byte at source[pos], or 0 if we've walked off the end. The 0 is our end-of-input sentinel -- we'll lean on it constantly.

Use this main to try it out:

pub fn main() void {
    
    source = "hi!";
    pos = 0;

    print("pos {d}: {c}\n", .{ pos, cur() });   // 0: h
    pos += 1;
    
    print("pos {d}: {c}\n", .{ pos, cur() });   // 1: i
    pos += 1;
    
    print("pos {d}: {c}\n", .{ pos, cur() });   // 2: !
    pos += 1;
    
    print("fell off the end, cur() = {d}\n", .{cur()}); // 0
}
var source: []const u8 = "";
var pos: usize = 0;

fn cur() u8 {
    if (pos >= source.len) {
        return 0;
    }
    return source[pos];
}
eval-with-cursor

Rewrite eval to use the cursor. Read digit, advance, read op, advance, read digit, advance. The same four checks should still pass.

fn eval(input: []const u8) i64 {
    source = input;
    pos = 0;
    const a: i64 = digit(cur());
    pos += 1;
    const op: u8 = cur();
    pos += 1;
    const b: i64 = digit(cur());
    pos += 1;
    return switch (op) {
        '+' => a + b,
        '-' => a - b,
        '*' => a * b,
        '/' => @divTrunc(a, b),
        else => 0,
    };
}

Same answers. But now we can move.

walk-with-while

Manually bumping pos for every character is tedious. We want a loop.

Zig's while works like this:

var n: i64 = 0;
while (n < 3) {
    print("{d}\n", .{n});
    n += 1;
}

I know Zig also supports placing the n += 1 next to the condition, but I don't feel like supporting that in minizig.

Now use it on source. Walk through with pos += 1, printing each character and its position. Stop when cur() == 0. Then check that cur() is still 0 once you've fallen off.

Suggested string: "hello!". Output should be pos 0: h, pos 1: e, ..., pos 5: !.

pub fn main() void {
    source = "hello!";
    pos = 0;
    while (cur() != 0) {
        print("pos {d}: {c}\n", .{ pos, cur() });
        pos += 1;
    }
    print("fell off the end, cur() = {d}\n", .{cur()});
}
chained-operators

Handle chains: "3+5+2" -> 10. After reading the first number, loop: while there's a character left, read the operator and the next number, apply it.

    check("3+5+2", 10);
    check("9-3-1", 5);
fn eval(input: []const u8) i64 {
    source = input;
    pos = 0;
    var val: i64 = digit(cur());
    pos += 1;
    while (cur() != 0) {
        const op: u8 = cur();
        pos += 1;
        const right: i64 = digit(cur());
        pos += 1;
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

Look at that -- we went from fixed positions to a loop, and suddenly we can handle arbitrarily long expressions.

multi-digit-number

Multi-digit numbers. Write number() that reads digits in a loop, building up a value: multiply the running total by 10, add the digit.

To parse 345: 3 * 10 + 4 = 34, then 34 * 10 + 5 = 345. Stop when the next char isn't a digit.

    check("42", 42);
    check("100+23", 123);
    check("12*12", 144);
fn number() i64 {
    var val: i64 = 0;
    while (cur() >= '0' and cur() <= '9') {
        val = val * 10 + digit(cur());
        pos += 1;
    }
    return val;
}

Replace the single-digit reads in eval with calls to number():

fn eval(input: []const u8) i64 {
    source = input;
    pos = 0;

    var val: i64 = number();
    while (cur() != 0) {
        const op: u8 = cur();
        pos += 1;
        const right: i64 = number();
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

Watch out: there is no pos += 1 after the number() calls. The single-character version (digit(cur())) needed pos += 1 after each read to advance past that one character. But number() reads multiple characters and is responsible for leaving pos at the next unread byte.

This is the rule from now on: a function that reads one character expects the caller to advance. A function that reads a token (number(), later read_name(), etc.) advances on its way out. Adding a stray pos += 1 after a token-reader will skip the next character -- a classic source of "my parser ate the operator" bugs.

skip-whitespace

Skip whitespace. Write skip() that advances past whitespace characters. Treat any of these four bytes as whitespace: space ' ', tab '\t', newline '\n', carriage return '\r'.

Strategy: we want the cursor to always be sitting on a non-whitespace character when we're about to read something. So we call skip() once at the start of eval to handle leading whitespace, and after every consume so the next reader doesn't have to think about spaces.

    check("  3  +  5  ", 8);
    check("100 + 23", 123);
fn is_whitespace(c: u8) bool {
    return c == ' ' or c == '\n' or c == '\t' or c == '\r';
}

fn skip() void {
    while (is_whitespace(cur())) {
        pos += 1;
    }
}

You might worry about an input that ends with whitespace -- the loop keeps advancing pos past the last character. But cur() already handles end-of-input: once pos >= source.len, it returns 0, and is_whitespace(0) is false, so the loop stops on its own. No bounds check needed here.

Three places to add skip();:

1. At the start of eval, before parsing begins.
2. At the end of number(), before returning.
3. After the pos += 1 that consumes the operator character. Easy to forget -- the operator is just one byte, so there's no read_operator() helper that would skip for us.

parens

Parentheses. 2*(3+4) should be 14. Whatever sits inside (...) evaluates to one number, and the outer expression treats that number like any other operand.

To support this we split eval into two helpers:

- expression() -- the loop you already have, but reads operands by calling numberOrParens() and stops on ).
- numberOrParens() -- reads one operand: a number, or (, recurse into expression(), then expect ).

eval() becomes a thin wrapper that sets up state and calls expression().

    check("(2+3)", 5);
    check("2*(3+4)", 14);  // without parens: 2*3+4 = 10
    check("8-(3+4)", 1);   // without parens: 8-3+4 = 9
    check("((1+2)*(3+4))", 21);
fn numberOrParens() i64 {
    if (cur() == '(') {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return val;
    }
    return number();
}

fn expression() i64 {
    var val: i64 = numberOrParens();
    while (cur() != 0 and cur() != ')') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = numberOrParens();
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

fn eval(input: []const u8) i64 {
    source = input;
    pos = 0;
    skip();
    return expression();
}

Notice that 2+3*4 still gives 20 here -- parens don't fix precedence on their own. But you can write 2+(3*4) to get 14. The escape hatch.

precedence-bug

Time to face the next problem. 2+3*4 should be 14 in math. Our parser gives 20. Write a failing test:

    check("2+3*4", 14);

Run it. Watch FAIL print. Don't fix anything yet -- just sit with it. Why does it go wrong?

Trace expression() on 2+3*4 in your head. It reads 2 as a numberOrParens. It sees +. It reads the next numberOrParens. What does numberOrParens() see, and what does it return?

FAIL: got 20, want 14.

Trace:

expression() reads 2. Sees +. Reads next numberOrParens: just 3.
Adds: 2+3 = 5. Sees *. Reads next numberOrParens: 4.
Multiplies: 5*4 = 20.

The bug: after the +, the right operand was a single numberOrParens (just 3). But the * was waiting to grab that 3 first. We need the + to wait until the * has done its work on the right side.

terms-and-factors

The parser is at 2+3*4. It just consumed the 2 and the +. It's sitting on the 3. Why can't it just add 2 + 3 right now?

Because we're in a sum, and the right operand of a sum isn't a single value -- it's a term. A term is one or more factors multiplied (or divided) together. A factor is a number, or something inside parentheses (the numberOrParens we already wrote). Even a lone 3 is a term -- a term made of one factor.

So when the expression() reads its right operand, it should ask for a term, not just a factor. A term function reads factors and grinds through any *// chain it finds, then hands back one number. The + only happens after that.

The structure we're heading toward:

- expression -- loops on +/-, asks for terms between them.
- term -- loops on *//, asks for factors between them.
- factor -- a number, or a parenthesized expression. (This is what we've been calling numberOrParens -- now rename it to factor. The grammar vocabulary clicks.)

For this step:
1. Rename numberOrParens() to factor(). Update the call sites in expression().
2. Introduce term() as a passthrough -- it just returns factor(), no loop yet.
3. Have expression() call term() instead of factor() in its operand reads.

fn factor() i64 {
    if (cur() == '(') {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return val;
    }
    return number();
}

fn term() i64 {
    return factor();
}

fn expression() i64 {
    var val: i64 = term();
    while (cur() != 0 and cur() != ')') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = term();
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

The bug isn't fixed yet. Stare at this for one minute and guess what you will put in term() in the next page.

term-multiplies

Now make term() do its job.

Move the * and / handling out of expression() into term(). expression() loops only on +/-. term() loops only on *// and asks factor() for each operand.

When expression() calls term() for its right operand on 2+3*4, term() sees 3, then *4, and returns 12. Then expression() adds: 2+12 = 14.

    check("2+3*4", 14);
    check("3*4+2", 14);
    check("2*3+4*5", 26);
    check("(2+3)*4", 20);
    check("100 - 55 * 2", -10);
fn term() i64 {
    var val: i64 = factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = factor();
        val = switch (op) {
            '*' => val * right,
            '/' => @divTrunc(val, right),
            else => val,
        };
    }
    return val;
}

fn expression() i64 {
    var val: i64 = term();
    while (cur() == '+' or cur() == '-') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = term();
        val = switch (op) {
            '+' => val + right,
            '-' => val - right,
            else => val,
        };
    }
    return val;
}

Three functions, three precedence layers: expression -> term -> factor. Each layer loops on its own operators and asks the next-tighter layer for operands. Operands come back fully resolved. Recursion does the rest.

This is recursive descent. To add another precedence level (say, ^ for power between term and factor), you insert one more function in the chain. Comparison operators come next; they slot in as a layer above expression.

comparison-operators

Add comparisons: >, <, ==, !=. They return 1 for true, 0 for false, and bind more loosely than addition.

For == and !=, check two characters: cur() and source[pos+1].

    check("5 > 3", 1);
    check("2 > 7", 0);
    check("3 == 3", 1);
    check("3 != 3", 0);

Update expression() to check for comparison operators after calling add_sub():

fn expression() i64 {
    var val: i64 = add_sub();
    if (cur() == '>') {
        pos += 1;
        skip();
        return if (val > add_sub()) @as(i64, 1) else 0;
    }
    if (cur() == '<') {
        pos += 1;
        skip();
        return if (val < add_sub()) @as(i64, 1) else 0;
    }
    if (cur() == '=' and pos + 1 < source.len and source[pos + 1] == '=') {
        pos += 2;
        skip();
        return if (val == add_sub()) @as(i64, 1) else 0;
    }
    if (cur() == '!' and pos + 1 < source.len and source[pos + 1] == '=') {
        pos += 2;
        skip();
        return if (val != add_sub()) @as(i64, 1) else 0;
    }
    return val;
}
trace-precedence

Walk through the execution of expression() -> add_sub() -> term() -> factor() for the input "2 + 3 * 4". Write down each function call, which characters it consumes, and what it returns.

1. expression() -> add_sub()
2. add_sub() -> term() -> factor() -> number() reads 2, returns 2. term() sees + (not *), returns 2.
3. add_sub() sees +, advances. Calls term().
4. term() -> factor() -> number() reads 3, returns 3.
5. term() sees *, advances. Calls factor() -> number() reads 4, returns 4.
6. term() computes 3 * 4 = 12, returns 12.
7. add_sub() computes 2 + 12 = 14, returns 14.

The key insight: term() "grabs" the 3 * 4 before add_sub() can see it. That's how precedence works -- lower-precedence functions call higher-precedence ones, which consume their operands first.

Variables

is-letter-read-name

Write is_letter(c) and read_name(). A name is letters, digits, and underscores. read_name returns a slice into the source string.

fn is_letter(c: u8) bool {
    return (c >= 'a' and c <= 'z') or (c >= 'A' and c <= 'Z') or c == '_';
}

fn read_name() []const u8 {
    const start: usize = pos;
    while (pos < source.len and (is_letter(source[pos]) or
        (source[pos] >= '0' and source[pos] <= '9')))
    {
        pos += 1;
    }
    const name: []const u8 = source[start..pos];
    skip();
    return name;
}

source[start..pos] is a slice -- a view into the string. No copying, no allocation. Slices are one of Zig's best features, and we'll use them constantly.

streq-from-scratch

We need to compare strings. We're not going to use std.mem.eql -- we'll write our own. It's five lines and we'll need something like it in our self-hosted compiler anyway.

fn streq(a: []const u8, b: []const u8) bool {
    if (a.len != b.len) {
        return false;
    }
    for (a, b) |ca, cb| {
        if (ca != cb) {
            return false;
        }
    }
    return true;
}

Test it in main: print("{}\n", .{streq("cat", "cat")}); -> true. print("{}\n", .{streq("cat", "dog")}); -> false.

We'll use streq everywhere we need to check for keywords like "var", "if", "while".

variable-storage

Add variable storage: two parallel arrays for names and values, a counter, getVar and setVar.

var var_names: [256][]const u8 = undefined;
var var_values: [256]i64 = undefined;
var num_vars: usize = 0;

fn getVar(name: []const u8) i64 {
    var i: usize = num_vars;
    while (i > 0) {
        i -= 1;
        if (streq(var_names[i], name)) {
            return var_values[i];
        }
    }
    return 0;
}

fn setVar(name: []const u8, val: i64) void {
    for (0..num_vars) |i| {
        if (streq(var_names[i], name)) {
            var_values[i] = val;
            return;
        }
    }
    var_names[num_vars] = name;
    var_values[num_vars] = val;
    num_vars += 1;
}
factor-handles-names

Update factor() to handle names. When cur() is a letter, read_name() and return getVar(name). (We'll add function calls here later.)

Write exec_stmt() -- the statement executor. It checks for keywords (var, if, while, fn, return), assignment, or falls back to expression. Write program() -- a loop calling exec_stmt().

The var handler parses var name: i64 = expr; -- it reads the name, consumes : i64 (parsing the type but not using it yet), then evaluates the initializer.

    check("var x: i64 = 42; x", 42);
    check("var x: i64 = 5; x + 1", 6);
    check("var x: i64 = 5; x = x + 1; x", 6);
    check("var a: i64 = 10; var b: i64 = 20; a + b", 30);
fn exec_stmt() i64 {
    if (is_letter(cur())) {
        const save: usize = pos;
        const word: []const u8 = read_name();

        if (streq(word, "var") or streq(word, "const")) {
            const vname: []const u8 = read_name();
            if (cur() == ':') { pos += 1; skip(); _ = read_name(); } // parse type, ignore
            if (cur() == '=') { pos += 1; skip(); }
            const val: i64 = expression();
            if (cur() == ';') { pos += 1; skip(); }
            setVar(vname, val);
            return val;
        }

        // assignment: name = expr; (but not == comparison)
        if (cur() == '=' and (pos + 1 >= source.len or source[pos + 1] != '=')) {
            pos += 1;
            skip();
            const val: i64 = expression();
            if (cur() == ';') { pos += 1; skip(); }
            setVar(word, val);
            return val;
        }

        // backtrack -- it's an expression starting with a name
        pos = save;
    }
    const val: i64 = expression();
    if (cur() == ';') { pos += 1; skip(); }
    return val;
}

fn program() i64 {
    var last: i64 = 0;
    while (cur() != 0 and cur() != '}') {
        last = exec_stmt();
    }
    return last;
}

We parse : i64 and throw it away. Feels wasteful? It's not. When we compile to WAT, we'll emit (local $x i64) using that type. The groundwork is being laid quietly.

bug-missing-type

What's wrong?

var x = 5; x

Missing type annotation. Our language requires var x: i64 = 5;. The parser expects : after the variable name. Without it, the = 5 won't parse correctly. This is intentional -- every variable declaration says what it is, which matters when the compiler needs to know what WAT type to emit.

Control flow

if-else

Add if/else. When we see the word if, parse the condition in (), then execute or skip the { } block.

Two helpers:
- skip_block() -- jump past { ... } without executing, counting brace depth
- exec_block() -- execute statements inside { ... }

    check("if (5 > 3) { 10 } else { 20 }", 10);
    check("if (2 > 7) { 10 } else { 20 }", 20);
    check("var x: i64 = 10; if (x > 5) { x = 99; } x", 99);
fn skip_block() void {
    if (cur() == '{') { pos += 1; skip(); }
    var depth: i32 = 1;
    while (depth > 0 and cur() != 0) {
        if (cur() == '{') { depth += 1; }
        if (cur() == '}') { depth -= 1; }
        if (depth > 0) { pos += 1; }
    }
    if (cur() == '}') { pos += 1; skip(); }
}

fn exec_block() i64 {
    if (cur() == '{') { pos += 1; skip(); }
    var last: i64 = 0;
    while (cur() != '}' and cur() != 0) {
        last = exec_stmt();
        if (return_flag) { break; }
    }
    if (cur() == '}') { pos += 1; skip(); }
    return if (return_flag) return_val else last;
}

Add to exec_stmt, right after the var handler:

        if (streq(word, "if")) {
            if (cur() == '(') { pos += 1; skip(); }
            const cond: i64 = expression();
            if (cur() == ')') { pos += 1; skip(); }
            if (cond != 0) {
                const val: i64 = exec_block();
                if (is_letter(cur())) {
                    const s2: usize = pos;
                    const w2: []const u8 = read_name();
                    if (streq(w2, "else")) { skip_block(); } else { pos = s2; }
                }
                return val;
            } else {
                skip_block();
                if (is_letter(cur())) {
                    const s2: usize = pos;
                    const w2: []const u8 = read_name();
                    if (streq(w2, "else")) { return exec_block(); }
                    pos = s2;
                }
                return 0;
            }
        }
bug-missing-braces

Why doesn't this work?

var x: i64 = 10;
if (x > 5)
    x = 99;

No braces. Our language requires if (x > 5) { x = 99; }. The parser looks for { after the condition. No braces, no party. This is a deliberate choice: it keeps the grammar simple, makes debugging easier (you can set a breakpoint inside the {}), and -- crucially -- makes the compiler simpler. When we emit WAT, we just emit if/end blocks. No ambiguity about what belongs to which branch.

while-loop

Add while. Save the position before the condition. Loop: re-parse the condition from the saved position each time, execute or skip the body.

    check(
        \\var s: i64 = 0; var i: i64 = 1;
        \\while (i < 6) {
        \\    s = s + i;
        \\    i = i + 1;
        \\}
        \\s
    , 15);
        if (streq(word, "while")) {
            const loop_pos: usize = pos;
            var last: i64 = 0;
            while (true) {
                pos = loop_pos;
                if (cur() == '(') { pos += 1; skip(); }
                const cond: i64 = expression();
                if (cur() == ')') { pos += 1; skip(); }
                if (cond == 0) { skip_block(); break; }
                last = exec_block();
                if (return_flag) { break; }
            }
            return last;
        }

Notice something weird: we're re-parsing the condition string from scratch every iteration. We jump pos back to loop_pos, re-read the characters, re-evaluate the expression. It works, but it means we're paying the parsing cost every loop iteration. This is one reason we'll eventually want a tree -- you parse once, then walk the tree as many times as you like.

nested-control-flow

Test nested control flow:

    check(
        \\var x: i64 = 15;
        \\if (x > 10) {
        \\    if (x > 20) { 1 } else { 2 }
        \\} else {
        \\    3
        \\}
    , 2);

## Part 5: Functions

This is the biggest step. A function call has to: evaluate arguments, save the caller's variables, bind arguments to parameters, execute the body, capture the return value, and restore everything. It's a lot of state management. But the reward is fib(10) = 89.

Functions

function-storage

Add function storage. When exec_stmt sees fn, read the name, the parameter list (with types), the return type, and save the body position. Skip the body without executing it.

var fn_names: [64][]const u8 = undefined;
var fn_params: [64][8][]const u8 = undefined;
var fn_pcounts: [64]usize = undefined;
var fn_starts: [64]usize = undefined;
var fn_count: usize = 0;

Add to exec_stmt:

        if (streq(word, "fn")) {
            const fname: []const u8 = read_name();
            fn_names[fn_count] = fname;
            if (cur() == '(') { pos += 1; skip(); }
            var pc: usize = 0;
            while (cur() != ')' and cur() != 0) {
                fn_params[fn_count][pc] = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                pc += 1;
                if (cur() == ',') { pos += 1; skip(); }
            }
            fn_pcounts[fn_count] = pc;
            if (cur() == ')') { pos += 1; skip(); }
            if (is_letter(cur())) { _ = read_name(); } // return type
            fn_starts[fn_count] = pos;
            skip_block();
            fn_count += 1;
            return 0;
        }

We parse fn fib(n: i64) i64 { ... } and store the name, parameters, and body position. The types are parsed and discarded -- for now. Later: (func $fib (param $n i64) (result i64) ...).

return-flag

Add return. Global flag, global value:

var return_flag: bool = false;
var return_val: i64 = 0;
        if (streq(word, "return")) {
            return_val = expression();
            if (cur() == ';') { pos += 1; skip(); }
            return_flag = true;
            return return_val;
        }
function-calls

Add function calls to factor(). When we see name(, it's a call.

    check("fn add(a: i64, b: i64) i64 { return a + b; } add(3, 5)", 8);
    check("fn double(x: i64) i64 { return x * 2; } double(21)", 42);

In factor(), where we handle names:

    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (cur() == '(') {
            pos += 1;
            skip();
            var args: [8]i64 = undefined;
            var ac: usize = 0;
            while (cur() != ')' and cur() != 0) {
                args[ac] = expression();
                ac += 1;
                if (cur() == ',') { pos += 1; skip(); }
            }
            if (cur() == ')') { pos += 1; skip(); }
            return call_fn(name, &args, ac);
        }
        return getVar(name);
    }
var save_vn: [32][256][]const u8 = undefined;
var save_vv: [32][256]i64 = undefined;
var save_nv: [32]usize = undefined;
var save_depth: usize = 0;

fn call_fn(name: []const u8, args: *const [8]i64, ac: usize) i64 {
    _ = ac;
    // Find function
    var fi: usize = 0;
    while (fi < fn_count) {
        if (streq(fn_names[fi], name)) { break; }
        fi += 1;
    }
    if (fi >= fn_count) { return 0; }
    // Save caller state
    const d: usize = save_depth;
    save_nv[d] = num_vars;
    const saved_pos: usize = pos;
    for (0..num_vars) |i| {
        save_vn[d][i] = var_names[i];
        save_vv[d][i] = var_values[i];
    }
    save_depth += 1;
    // Bind parameters
    num_vars = 0;
    for (0..fn_pcounts[fi]) |i| {
        setVar(fn_params[fi][i], args[i]);
    }
    // Execute body
    pos = fn_starts[fi];
    return_flag = false;
    const result: i64 = exec_block();
    // Restore
    save_depth -= 1;
    pos = saved_pos;
    num_vars = save_nv[d];
    for (0..num_vars) |i| {
        var_names[i] = save_vn[d][i];
        var_values[i] = save_vv[d][i];
    }
    return_flag = false;
    return result;
}
fibonacci-test

The big test:

    check(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0;
        \\    var b: i64 = 1;
        \\    var i: i64 = 0;
        \\    while (i < n) {
        \\        var t: i64 = a + b;
        \\        a = b;
        \\        b = t;
        \\        i = i + 1;
        \\    }
        \\    return b;
        \\}
        \\fib(10)
    , 89);

If this prints ok 89 -- congratulations. You have a programming language. It handles arithmetic with precedence, typed variables, mandatory-brace if/else, while loops, and functions with typed parameters and return values. All from a string. All in one Zig file.

Stop and think about that for a second. The fib function calls itself conceptually (through the while loop and variable manipulation), and our save/restore mechanism keeps everything straight. We started 27 problems ago with eval("7") returning 7. Now we have Fibonacci. That's a real distance.

bug-missing-return

What does this return?

fn double(x: i64) i64 {
    x * 2
}
double(21)

0. Not 42. Without return, the function body evaluates x * 2 = 42 but return_flag is never set, so call_fn returns 0. You need return x * 2;. This is a real footgun -- and every language has to decide what to do about it. We keep it simple: if you want a value back, say return.

full-test-reset

Run all tests together with proper state reset:

fn check(input: []const u8, expected: i64) void {
    source = input;
    pos = 0;
    num_vars = 0;
    fn_count = 0;
    return_flag = false;
    save_depth = 0;
    skip();
    const got: i64 = program();
    if (got == expected) {
        print("  ok {d}\n", .{got});
    } else {
        print("  FAIL got={d} want={d}\n", .{ got, expected });
    }
}

Ten tests. Ten oks. The interpreter is complete.

## Part 6: The Compiler

Here's what we have: a parser that reads source text and computes answers. factor() reads a number, term() handles * and /, expression() handles + and -. They call each other recursively, and values flow back up the call chain. It works beautifully.

Now here's the trick. What if, instead of computing values, those functions emitted instructions? What if factor() didn't return 42 but printed i64.const 42? What if term() didn't multiply two values but printed i64.mul?

The parser structure wouldn't change at all. The recursion, the operator loops, the precedence -- all identical. Only the actions change. Crenshaw called this "syntax-directed translation," and it's one of the most satisfying ideas in all of computer science. You've already written the hard part -- the parser. The compiler is the parser wearing different shoes.

Let's build it.

The Compiler

output-buffer

Write the output buffer. We need somewhere to put the WAT instructions. A big byte array and a position, just like source and pos but for output.

var out: [1 << 20]u8 = [_]u8{0} ** (1 << 20);
var out_len: usize = 0;

Write emit_byte(b), emit_str(s), and emit_num(n) (which prints a number as decimal digits into the buffer):

fn emit_byte(b: u8) void {
    if (out_len < out.len) {
        out[out_len] = b;
        out_len += 1;
    }
}

fn emit_str(s: []const u8) void {
    for (s) |c| {
        emit_byte(c);
    }
}

fn emit_num(n: i64) void {
    if (n < 0) {
        emit_byte('-');
        emit_num(-n);
        return;
    }
    if (n >= 10) {
        emit_num(@divTrunc(n, 10));
    }
    emit_byte(@intCast(@as(u8, @intCast(@rem(n, 10))) + '0'));
}

Test it:

    out_len = 0;
    emit_str("hello ");
    emit_num(42);
    emit_byte('\n');
    print("{s}", .{out[0..out_len]});   // hello 42

This is the compiler's pen and paper. Everything it produces goes through these three functions.

Self-hosting note: when we write the compiler in its own language, these exact functions will exist there too. emit_byte, emit_s, emit_num. Simple enough to express in the language they compile.

emit-const-emit-op

Write two helpers:

fn emit_const(v: i64) void {
    emit_str("  i64.const ");
    emit_num(v);
    emit_byte('\n');
}

fn emit_op(o: []const u8) void {
    emit_str("  ");
    emit_str(o);
    emit_byte('\n');
}
c-number

Write c_number() -- reads a number from source, emits i64.const N:

fn c_number() void {
    const val: i64 = number();
    emit_const(val);
}

One line of real work. We reuse number() for parsing and do something different with the result.

c-factor-term-expression

Write c_factor(), c_term(), c_add_sub(), c_expression(). They mirror the interpreter exactly -- same structure, same operator checks, same recursion -- but emit instead of compute.

Here's the key comparison. The interpreter's term():

fn term() i64 {
    var val: i64 = factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = factor();
        if (op == '*') { val = val * right; } else { val = @divTrunc(val, right); }
    }
    return val;
}

The compiler's c_term():

fn c_term() void {
    c_factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        c_factor();
        if (op == '*') { emit_op("i64.mul"); } else { emit_op("i64.div_s"); }
    }
}

Same structure. Where the interpreter said val = val * right, the compiler says emit_op("i64.mul"). This works because WAT is a stack machine: c_factor() emits instructions that leave a value on the stack, and i64.mul pops two values and pushes the product.

fn c_factor() void {
    if (cur() == '(') {
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        return;
    }
    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (cur() == '(') {
            c_call(name);
            return;
        }
        if (c_is_global(name)) {
            emit_str("  global.get $");
        } else {
            emit_str("  local.get $");
        }
        emit_str(name);
        emit_byte('\n');
        return;
    }
    c_number();
}

fn c_term() void {
    c_factor();
    while (cur() == '*' or cur() == '/') {
        const op: u8 = cur();
        pos += 1; skip();
        c_factor();
        if (op == '*') { emit_op("i64.mul"); } else { emit_op("i64.div_s"); }
    }
}

fn c_add_sub() void {
    c_term();
    while (cur() == '+' or cur() == '-') {
        const op: u8 = cur();
        pos += 1; skip();
        c_term();
        if (op == '+') { emit_op("i64.add"); } else { emit_op("i64.sub"); }
    }
}

fn c_expression() void {
    c_add_sub();
    if (cur() == '>') {
        pos += 1; skip(); c_add_sub(); emit_op("i64.gt_s");
    } else if (cur() == '<') {
        pos += 1; skip(); c_add_sub(); emit_op("i64.lt_s");
    } else if (cur() == '=' and pos + 1 < source.len and source[pos + 1] == '=') {
        pos += 2; skip(); c_add_sub(); emit_op("i64.eq");
    } else if (cur() == '!' and pos + 1 < source.len and source[pos + 1] == '=') {
        pos += 2; skip(); c_add_sub(); emit_op("i64.ne");
    }
}

Notice: c_factor already checks c_is_global(name) to decide between global.get and local.get. We'll define c_is_global shortly when we set up the global tracking.

c-call

Write c_call(name) -- compile function arguments and emit call $name:

fn c_call(name: []const u8) void {
    pos += 1; skip(); // skip '('
    while (cur() != ')' and cur() != 0) {
        c_expression();
        if (cur() == ',') { pos += 1; skip(); }
    }
    if (cur() == ')') { pos += 1; skip(); }
    emit_str("  call $");
    emit_str(name);
    emit_byte('\n');
}
c-stmt-c-block

Write c_stmt() and c_block(). This is where stack hygiene matters.

WAT has strict stack discipline -- every instruction sequence must leave the stack in a predictable state. So c_stmt returns true when it leaves a value on the stack (bare expressions) and false when it doesn't (var declarations, assignments, control flow). c_block uses this to drop intermediate values and ensure exactly one value remains at the end.

fn c_stmt() bool {
    if (is_letter(cur())) {
        const save: usize = pos;
        const word: []const u8 = read_name();

        if (streq(word, "var") or streq(word, "const")) {
            const vname: []const u8 = read_name();
            if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
            if (cur() == '=') { pos += 1; skip(); }
            c_expression();
            if (c_is_global(vname)) {
                emit_str("  global.set $");
            } else {
                emit_str("  local.set $");
            }
            emit_str(vname);
            emit_byte('\n');
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

        if (streq(word, "return")) {
            c_expression();
            emit_op("return");
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

        if (streq(word, "if")) {
            if (cur() == '(') { pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { pos += 1; skip(); }
            emit_str("  if (result i64)\n");
            c_block();
            if (is_letter(cur())) {
                const s2: usize = pos;
                const w2: []const u8 = read_name();
                if (streq(w2, "else")) {
                    emit_op("else");
                    c_block();
                } else {
                    pos = s2;
                    emit_str("  else\n");
                    emit_const(0);
                }
            } else {
                emit_str("  else\n");
                emit_const(0);
            }
            emit_op("end");
            return true;  // if (result i64) leaves one value
        }

        if (streq(word, "while")) {
            emit_op("block");
            emit_op("loop");
            if (cur() == '(') { pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { pos += 1; skip(); }
            emit_op("i64.eqz");
            emit_op("br_if 1");
            c_block();
            emit_op("drop");
            emit_op("br 0");
            emit_op("end");
            emit_op("end");
            return false;
        }

        if (streq(word, "fn")) {
            c_fn_def();
            return false;
        }

        // assignment: name = expr; (but not == comparison)
        if (cur() == '=' and (pos + 1 >= source.len or source[pos + 1] != '=')) {
            pos += 1; skip();
            c_expression();
            if (c_is_global(word)) {
                emit_str("  global.set $");
            } else {
                emit_str("  local.set $");
            }
            emit_str(word);
            emit_byte('\n');
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

        pos = save; // backtrack -- expression starting with name
    }

    c_expression();
    if (cur() == ';') { pos += 1; skip(); }
    return true;  // bare expression leaves a value on the stack
}

fn c_block() void {
    if (cur() == '{') { pos += 1; skip(); }
    var last_had_value: bool = false;
    while (cur() != '}' and cur() != 0) {
        if (last_had_value) {
            emit_op("drop");
        }
        last_had_value = c_stmt();
    }
    if (!last_had_value) {
        emit_const(0);
    }
    if (cur() == '}') { pos += 1; skip(); }
}

The if handler uses if (result i64) -- both branches must produce exactly one value. c_block guarantees this. When there's no else, we synthesize one that pushes 0. The if returns true because it leaves a value on the stack.

The while handler drops the block value before br 0 (the loop-back jump). Without the drop, each iteration would pile another value on the stack.

The assignment handler checks source[pos + 1] != '=' to avoid confusing x = 5 with x == 5. Without this, x == 5 would be parsed as assignment, storing 0.

c-global-tracking

Global variable tracking for the compiler:

var c_global_names: [64][]const u8 = undefined;
var c_global_count: usize = 0;

fn c_is_global(name: []const u8) bool {
    for (0..c_global_count) |i| {
        if (streq(c_global_names[i], name)) {
            return true;
        }
    }
    return false;
}
c-local-scanning

Local variable scanning. WAT requires all (local ...) declarations at the top of a function. We scan the body before compiling it:

var local_names: [64][]const u8 = undefined;
var local_count: usize = 0;

fn scan_for_locals(start: usize, end_pos: usize) void {
    const saved: usize = pos;
    pos = start;
    local_count = 0;
    while (pos < end_pos) {
        if (is_letter(cur())) {
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vname: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                var dup: bool = false;
                for (0..local_count) |i| {
                    if (streq(local_names[i], vname)) { dup = true; break; }
                }
                if (!dup and !c_is_global(vname)) {
                    local_names[local_count] = vname;
                    local_count += 1;
                }
            }
        } else {
            pos += 1;
        }
    }
    pos = saved;
}

fn emit_locals() void {
    for (0..local_count) |i| {
        emit_str("  (local $");
        emit_str(local_names[i]);
        emit_str(" i64)\n");
    }
    emit_str("  (local $__tmp i64)\n");
}

Every function gets a $__tmp local -- the store instructions need it to swap operand order on the stack.

c-fn-def

Write c_fn_def() -- compile a function definition:

fn c_fn_def() void {
    const fname: []const u8 = read_name();
    if (cur() == '(') { pos += 1; skip(); }

    var params: [8][]const u8 = undefined;
    var ptypes: [8][]const u8 = undefined;
    var pc: usize = 0;
    while (cur() != ')' and cur() != 0) {
        params[pc] = read_name();
        if (cur() == ':') { pos += 1; skip(); ptypes[pc] = read_name(); } else { ptypes[pc] = "i64"; }
        pc += 1;
        if (cur() == ',') { pos += 1; skip(); }
    }
    if (cur() == ')') { pos += 1; skip(); }

    var rtype: []const u8 = "i64";
    if (is_letter(cur())) { rtype = read_name(); }

    const body_start: usize = pos;
    skip_block();
    const body_end: usize = pos;

    scan_for_locals(body_start, body_end);

    emit_str("(func $");
    emit_str(fname);
    for (0..pc) |i| {
        emit_str(" (param $");
        emit_str(params[i]);
        emit_str(" ");
        emit_str(ptypes[i]);
        emit_str(")");
    }
    emit_str(" (result ");
    emit_str(rtype);
    emit_str(")\n");
    emit_locals();

    pos = body_start;
    c_block();
    emit_op("drop"); // c_block always produces a value; function returns via 'return'
    emit_const(0);   // default return value if no 'return' hit

    emit_str(")\n");
}

The function body goes through c_block which produces a value. We drop it and push 0 as a default return. Functions should use explicit return expr; -- the return instruction bypasses the default.

c-builtins

Builtin handling in c_factor(). Add before the general function call:

    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (cur() == '(') {
            // Builtins
            if (streq(name, "load8")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_op("i32.wrap_i64");
                emit_op("i32.load8_u");
                emit_op("i64.extend_i32_u");
                return;
            }
            if (streq(name, "store8")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ',') { pos += 1; skip(); }
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_str("  local.set $__tmp\n");
                emit_str("  i32.wrap_i64\n");
                emit_str("  local.get $__tmp\n");
                emit_str("  i32.wrap_i64\n");
                emit_op("i32.store8");
                emit_const(0);
                return;
            }
            if (streq(name, "load64")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_op("i32.wrap_i64");
                emit_op("i64.load");
                return;
            }
            if (streq(name, "store64")) {
                pos += 1; skip();
                c_expression();
                if (cur() == ',') { pos += 1; skip(); }
                c_expression();
                if (cur() == ')') { pos += 1; skip(); }
                emit_str("  local.set $__tmp\n");
                emit_str("  i32.wrap_i64\n");
                emit_str("  local.get $__tmp\n");
                emit_op("i64.store");
                emit_const(0);
                return;
            }
            // General function call
            c_call(name);
            return;
        }
        // Variable reference
        if (c_is_global(name)) {
            emit_str("  global.get $");
        } else {
            emit_str("  local.get $");
        }
        emit_str(name);
        emit_byte('\n');
        return;
    }
c-program

Write c_program() -- the top-level function. Three passes: scan globals, compile functions, compile main.

fn c_program(input: []const u8) void {
    out_len = 0;
    c_global_count = 0;

    // Pass 1: scan for top-level globals
    source = input; pos = 0; skip();
    var depth: i32 = 0;
    while (pos < source.len) {
        if (source[pos] == '{') { depth += 1; pos += 1; continue; }
        if (source[pos] == '}') { depth -= 1; pos += 1; continue; }
        if (depth == 0 and is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vn: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                c_global_names[c_global_count] = vn;
                c_global_count += 1;
            } else if (streq(w, "fn")) {
                while (cur() != '{' and cur() != 0) { pos += 1; }
                if (cur() == '{') { skip_block(); }
            } else {
                pos = s; pos += 1;
            }
        } else { pos += 1; }
    }

    // Emit globals
    for (0..c_global_count) |i| {
        emit_str("(global $");
        emit_str(c_global_names[i]);
        emit_str(" (mut i64) (i64.const 0))\n");
    }

    // Pass 2: compile function definitions
    source = input; pos = 0; skip();
    while (cur() != 0) {
        if (is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "fn")) { pos = s; _ = c_stmt(); continue; }
            pos = s;
        }
        if (cur() == '{') { skip_block(); } else { pos += 1; }
    }

    // Pass 3: compile top-level code into main
    emit_str("(func (export \"main\") (result i64)\n");

    // Scan for top-level locals
    source = input; pos = 0; skip();
    local_count = 0;
    depth = 0;
    while (pos < source.len) {
        if (source[pos] == '{') { depth += 1; pos += 1; continue; }
        if (source[pos] == '}') { depth -= 1; pos += 1; continue; }
        if (depth == 0 and is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vn: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                if (!c_is_global(vn)) {
                    var dup: bool = false;
                    for (0..local_count) |i| { if (streq(local_names[i], vn)) { dup = true; break; } }
                    if (!dup) { local_names[local_count] = vn; local_count += 1; }
                }
            } else if (streq(w, "fn")) {
                while (cur() != '{' and cur() != 0) { pos += 1; }
                if (cur() == '{') { skip_block(); }
            } else { pos = s; pos += 1; }
        } else { pos += 1; }
    }
    emit_locals();

    // Compile top-level statements (skip fn definitions)
    source = input; pos = 0; skip();
    var last_had_value: bool = false;
    while (cur() != 0) {
        if (is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "fn")) {
                while (cur() != '{' and cur() != 0) { pos += 1; }
                skip_block();
                continue;
            }
            pos = s;
        }
        if (last_had_value) { emit_op("drop"); }
        last_had_value = c_stmt();
    }
    if (!last_had_value) { emit_const(0); }

    emit_str(")\n");
}
test-compiler-output

Test the compiler. Compile a source string and inspect the WAT:

fn compile_expr(input: []const u8) void {
    c_program(input);
    print("{s}", .{out[0..out_len]});
}
    print("--- WAT output ---\n", .{});
    compile_expr("2 + 3 * 4");

Expected output (the global/main wrapper plus three instructions and a return value):

(func (export "main") (result i64)
  (local $__tmp i64)
  i64.const 2
  i64.const 3
  i64.const 4
  i64.mul
  i64.add
)
compile-fibonacci

Compile fib(10) and inspect the output:

    c_program(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0; var b: i64 = 1; var i: i64 = 0;
        \\    while (i < n) { var t: i64 = a + b; a = b; b = t; i = i + 1; }
        \\    return b;
        \\}
        \\fib(10)
    );
    print("--- Fib WAT ---\n{s}\n", .{out[0..out_len]});
bug-missing-eqz

This WAT compiles but gives the wrong answer for while (i < 5). Why?

  block
  loop
    local.get $i
    i64.const 5
    i64.lt_s
    br_if 1
    ;; body
    br 0
  end
  end

Missing i64.eqz. br_if branches when the value is nonzero. i64.lt_s returns 1 when true. So br_if 1 exits when the condition is true -- backwards. i64.eqz flips it: exit when the condition is false (zero).

trace-stack-machine

Walk through the WAT for 2 + 3 * 4 on a stack machine:

| Instruction | Stack |
|-------------|-------|
| i64.const 2 | [2] |
| i64.const 3 | [2, 3] |
| i64.const 4 | [2, 3, 4] |
| i64.mul | [2, 12] |
| i64.add | [14] |

The parser's recursive descent produces the instruction order: deeper recursion (higher precedence) emits inner instructions that execute first.

design-no-ast

Why didn't we need an AST?

WAT's stack structure mirrors the parser's recursive structure. c_term() calls c_factor() first, which emits its instructions first, which means those instructions execute first on the stack. The recursive descent parser naturally produces the right instruction order. An AST would let us optimize or emit multiple formats, but for a self-hosting compiler, fewer data structures means less code means easier self-hosting. c4 works the same way -- 525 lines, no AST, single-pass.

test-compiler-globals

Test the compiler with globals. Compile this and inspect the output:

    c_program("var x: i64 = 0; fn inc() i64 { x = x + 1; return x; } inc()");
    print("{s}", .{out[0..out_len]});

You should see (global $x (mut i64) (i64.const 0)) in the output, and global.get $x / global.set $x inside the function body -- not local.get / local.set.

verify-all-tests

Verify all previous interpreter tests still pass. The compiler changes should not have broken the interpreter -- they're separate code paths. Run all 29 interpreter tests plus the global variable tests from Part 9. All ok.

What we have now: a compiler that reads the same source text as the interpreter and emits WAT instructions. It handles arithmetic with precedence, typed variables, if/else, while loops, typed functions, global variables, and memory access. The parser structure is identical to the interpreter -- we just swapped computation for emission.

What we need next: a way to run the WAT without leaving Zig. We can't rely on wat2wasm and Node.js for our test suite -- we need to verify the compiler's output automatically, in the same process. That means building a VM.

## Part 7: The VM

We have two things: an interpreter that computes answers from source text, and a compiler that produces WAT from the same source text. What we don't have is proof that the compiler is correct. The interpreter says fib(10) = 89. The compiler produces 30 lines of WAT. Do those 30 lines, when executed, also give 89?

We need a machine that runs WAT. Not a full WebAssembly runtime -- just enough to execute the instructions our compiler emits. We know exactly which instructions those are because we wrote the compiler. No surprises.

The machine is simple: a stack, some local variables, a program counter, and a handful of rules. Let's build it.

The VM

vm-split-lines

Split WAT text into lines. We'll store them in a global array and work with them by index.

var vm_lines: [4096][]const u8 = undefined;
var vm_line_count: usize = 0;

fn vm_split(wat: []const u8) void {
    vm_line_count = 0;
    var start: usize = 0;
    for (wat, 0..) |c, i| {
        if (c == '\n') {
            if (i > start) {
                vm_lines[vm_line_count] = wat[start..i];
                vm_line_count += 1;
            }
            start = i + 1;
        }
    }
    if (start < wat.len) {
        vm_lines[vm_line_count] = wat[start..wat.len];
        vm_line_count += 1;
    }
}

Test it:

    vm_split("  i64.const 2\n  i64.const 3\n  i64.add\n");
    print("lines: {d}\n", .{vm_line_count});   // 3
vm-trim-helper

Write a line-trimming helper. Our WAT lines have leading spaces. We need to strip them to compare against instruction names.

fn vm_trim(line: []const u8) []const u8 {
    var s: usize = 0;
    while (s < line.len and (line[s] == ' ' or line[s] == '\t')) {
        s += 1;
    }
    return line[s..];
}

And a helper to check if a line starts with a given prefix:

fn starts_with(hay: []const u8, needle: []const u8) bool {
    if (hay.len < needle.len) {
        return false;
    }
    for (needle, 0..) |c, i| {
        if (hay[i] != c) {
            return false;
        }
    }
    return true;
}

We'll use starts_with constantly -- starts_with(line, "i64.const "), starts_with(line, "local.get $"), etc. Each instruction has a recognizable prefix, and the operand (a number or a name) follows it.

vm-parse-int-name

Write a helper to parse a number from a string at a given offset:

fn parse_int(s: []const u8, from: usize) i64 {
    var i: usize = from;
    var neg: bool = false;
    if (i < s.len and s[i] == '-') {
        neg = true;
        i += 1;
    }
    var val: i64 = 0;
    while (i < s.len and s[i] >= '0' and s[i] <= '9') {
        val = val * 10 + digit(s[i]);
        i += 1;
    }
    return if (neg) -val else val;
}

And one to extract a name after $:

fn parse_name(s: []const u8, from: usize) []const u8 {
    var i: usize = from;
    // find the $
    while (i < s.len and s[i] != '$') {
        i += 1;
    }
    if (i < s.len) {
        i += 1; // skip $
    }
    const start: usize = i;
    while (i < s.len and s[i] != ' ' and s[i] != ')' and s[i] != '\n') {
        i += 1;
    }
    return s[start..i];
}

These are boring but necessary. The VM spends most of its time extracting numbers and names from text lines. In a real implementation, you'd use binary bytecode and skip all this parsing. We use text because we can read it and debug it.

vm-stack

The stack. An array and a pointer. Push, pop, peek.

var vm_stack: [256]i64 = [_]i64{0} ** 256;
var vm_sp: usize = 0;

fn vm_push(v: i64) void {
    vm_stack[vm_sp] = v;
    vm_sp += 1;
}

fn vm_pop() i64 {
    vm_sp -= 1;
    return vm_stack[vm_sp];
}
vm-locals

Local variables for the VM. Same pattern as the interpreter -- parallel arrays for names and values.

var vm_local_names: [64][]const u8 = undefined;
var vm_local_values: [64]i64 = [_]i64{0} ** 64;
var vm_local_count: usize = 0;

fn vm_get_local(name: []const u8) i64 {
    for (0..vm_local_count) |i| {
        if (streq(vm_local_names[i], name)) {
            return vm_local_values[i];
        }
    }
    return 0;
}

fn vm_set_local(name: []const u8, val: i64) void {
    for (0..vm_local_count) |i| {
        if (streq(vm_local_names[i], name)) {
            vm_local_values[i] = val;
            return;
        }
    }
    vm_local_names[vm_local_count] = name;
    vm_local_values[vm_local_count] = val;
    vm_local_count += 1;
}

We reuse our streq from Problem 16. No standard library, remember?

vm-exec-line

Write vm_exec_line(line) -- execute a single instruction. Start with arithmetic: i64.const N, i64.add, i64.sub, i64.mul, i64.div_s, and comparisons.

Returns true normally, false if we hit something that needs special handling (we'll use this later for control flow).

fn vm_exec_line(raw: []const u8) void {
    const line: []const u8 = vm_trim(raw);

    if (starts_with(line, "i64.const ")) {
        vm_push(parse_int(line, 10));
        return;
    }
    if (streq(line, "i64.add")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(a + b); return; }
    if (streq(line, "i64.sub")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(a - b); return; }
    if (streq(line, "i64.mul")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(a * b); return; }
    if (streq(line, "i64.div_s")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(@divTrunc(a, b)); return; }
    if (streq(line, "i64.gt_s")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a > b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.lt_s")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a < b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.eq")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a == b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.ne")) { const b: i64 = vm_pop(); const a: i64 = vm_pop(); vm_push(if (a != b) @as(i64, 1) else 0); return; }
    if (streq(line, "i64.eqz")) { const a: i64 = vm_pop(); vm_push(if (a == 0) @as(i64, 1) else 0); return; }

    if (starts_with(line, "local.get $")) {
        vm_push(vm_get_local(parse_name(line, 0)));
        return;
    }
    if (starts_with(line, "local.set $")) {
        vm_set_local(parse_name(line, 0), vm_pop());
        return;
    }
    if (starts_with(line, "(local $")) {
        // local declaration -- register the name, value starts at 0
        vm_set_local(parse_name(line, 0), 0);
        return;
    }

    // Lines we skip: (func, (param, (result, ), (export, etc.
    // Control flow (if, else, end, block, loop, br, call, return) handled by vm_run
}

Test it by feeding a few lines manually:

    vm_sp = 0;
    vm_exec_line("  i64.const 3");
    vm_exec_line("  i64.const 5");
    vm_exec_line("  i64.add");
    print("vm: {d}\n", .{vm_pop()});   // 8

That's the VM running 3 + 5 = 8 from WAT instructions. Same answer as the interpreter. We're getting somewhere.

vm-control-flow

Now for the hard part: control flow.

When the VM hits if, it pops the stack. If the value is nonzero, execution continues into the then-branch. If zero, we need to skip forward to the matching else or end. This requires scanning forward through lines, counting nesting depth.

Write vm_find_else(from) -- starting from an if line, scan forward to find the matching else at the same depth. Return the line index, or null if there's no else (just end).

Write vm_find_end(from) -- starting from an if, block, or loop line, scan forward to find the matching end.

fn vm_find_end(from: usize) usize {
    var depth: i32 = 1;
    var p: usize = from + 1;
    while (p < vm_line_count) {
        const ln: []const u8 = vm_trim(vm_lines[p]);
        if (streq(ln, "if") or starts_with(ln, "if ") or streq(ln, "block") or streq(ln, "loop")) {
            depth += 1;
        }
        if (streq(ln, "end")) {
            depth -= 1;
            if (depth == 0) {
                return p;
            }
        }
        p += 1;
    }
    return p;
}

fn vm_find_else(from: usize) ?usize {
    var depth: i32 = 1;
    var p: usize = from + 1;
    while (p < vm_line_count) {
        const ln: []const u8 = vm_trim(vm_lines[p]);
        if (streq(ln, "if") or starts_with(ln, "if ") or streq(ln, "block") or streq(ln, "loop")) {
            depth += 1;
        }
        if (streq(ln, "end")) {
            depth -= 1;
            if (depth == 0) {
                return null; // hit end before else
            }
        }
        if (depth == 1 and streq(ln, "else")) {
            return p;
        }
        p += 1;
    }
    return null;
}

These are the most subtle functions in the VM. They count nesting -- every if/block/loop increases depth, every end decreases it. An else only matches our if when the depth is back to 1 (our level). Nested ifs inside our branch have their own else/end pairs that we skip over.

vm-block-loop-br

The block/loop/br/br_if pattern for while loops.

Our compiler emits:

  block          ← outer (br target 1 = exit here)
  loop           ← inner (br target 0 = jump here)
    ;; condition
    i64.eqz
    br_if 1      ← exit the block (go past the outer end)
    ;; body
    br 0         ← jump back to loop (go to inner start)
  end            ← inner end
  end            ← outer end

The VM needs a block stack to track where to jump. When we enter a block or loop, we push its line index and type. br 0 jumps to the innermost entry. br 1 jumps to the next one out.

For block: jumping means going to the end (forward -- exit the block).
For loop: jumping means going back to the loop line (backward -- repeat).

This distinction is crucial. br 0 on a loop goes backward. br 1 on a block goes forward.

const BlockKind = enum { block, loop, ift };

var vm_block_kind: [64]BlockKind = undefined;
var vm_block_target: [64]usize = undefined;   // where to jump
var vm_block_depth: usize = 0;
vm-run

Write vm_run() -- the main execution loop. It uses a program counter pc that steps through lines. Most lines call vm_exec_line. Control flow instructions adjust pc directly.

This is the biggest function in the VM. Let's build it in stages.

var vm_pc: usize = 0;

fn vm_run_at(start_pc: usize, end_pc: usize) i64 {
    vm_pc = start_pc;
    vm_block_depth = 0;

    while (vm_pc < end_pc) {
        const line: []const u8 = vm_trim(vm_lines[vm_pc]);

        // --- Control flow ---

        if (streq(line, "if") or starts_with(line, "if ")) {
            const cond: i64 = vm_pop();
            if (cond != 0) {
                // enter then-branch
                vm_block_kind[vm_block_depth] = .ift;
                vm_block_target[vm_block_depth] = vm_find_end(vm_pc);
                vm_block_depth += 1;
                vm_pc += 1;
                continue;
            } else {
                // skip to else or end
                if (vm_find_else(vm_pc)) |else_pc| {
                    vm_block_kind[vm_block_depth] = .ift;
                    vm_block_target[vm_block_depth] = vm_find_end(vm_pc);
                    vm_block_depth += 1;
                    vm_pc = else_pc + 1;
                    continue;
                } else {
                    vm_pc = vm_find_end(vm_pc) + 1;
                    continue;
                }
            }
        }

        if (streq(line, "else")) {
            // we were in the then-branch and hit else -- skip to end
            if (vm_block_depth > 0) {
                vm_pc = vm_block_target[vm_block_depth - 1] + 1;
                vm_block_depth -= 1;
                continue;
            }
        }

        if (streq(line, "block")) {
            vm_block_kind[vm_block_depth] = .block;
            vm_block_target[vm_block_depth] = vm_find_end(vm_pc);
            vm_block_depth += 1;
            vm_pc += 1;
            continue;
        }

        if (streq(line, "loop")) {
            vm_block_kind[vm_block_depth] = .loop;
            vm_block_target[vm_block_depth] = vm_pc; // loops jump BACK
            vm_block_depth += 1;
            vm_pc += 1;
            continue;
        }

        if (streq(line, "end")) {
            if (vm_block_depth > 0) {
                vm_block_depth -= 1;
            }
            vm_pc += 1;
            continue;
        }

        if (starts_with(line, "br_if ")) {
            const depth: i64 = parse_int(line, 6);
            const cond: i64 = vm_pop();
            if (cond != 0) {
                // branch
                const target_depth: usize = vm_block_depth - 1 - @as(usize, @intCast(depth));
                if (vm_block_kind[target_depth] == .loop) {
                    vm_pc = vm_block_target[target_depth];
                    vm_block_depth = target_depth + 1;
                } else {
                    vm_pc = vm_block_target[target_depth] + 1;
                    vm_block_depth = target_depth;
                }
                continue;
            }
            vm_pc += 1;
            continue;
        }

        if (starts_with(line, "br ")) {
            const depth: i64 = parse_int(line, 3);
            const target_depth: usize = vm_block_depth - 1 - @as(usize, @intCast(depth));
            if (vm_block_kind[target_depth] == .loop) {
                vm_pc = vm_block_target[target_depth];
                vm_block_depth = target_depth + 1;
            } else {
                vm_pc = vm_block_target[target_depth] + 1;
                vm_block_depth = target_depth;
            }
            continue;
        }

        if (streq(line, "return")) {
            return vm_stack[vm_sp - 1];
        }

        // --- Everything else ---
        vm_exec_line(vm_lines[vm_pc]);
        vm_pc += 1;
    }

    // return top of stack if anything is there
    if (vm_sp > 0) {
        return vm_stack[vm_sp - 1];
    }
    return 0;
}

That's a lot of code. Let's break down the control flow logic:

if: Pop the condition. If nonzero, push a block entry and proceed into the then-branch. If zero, scan forward to else (and jump there) or end (and jump past it).

else: We got here by executing the then-branch to completion. Skip to the matching end.

block: Push a block entry. Its target is the end -- branching to a block means exiting forward.

loop: Push a block entry. Its target is the loop line itself -- branching to a loop means jumping backward.

br N: Pop N entries off the block stack. If the target is a loop, jump backward. If it's a block, jump forward past its end.

br_if N: Same as br N, but only if the popped value is nonzero.

end: Pop one block entry and continue.

return: Grab the top of stack and return immediately.

trace-vm-while

Walk through the VM executing a while loop. Here's the WAT for var i: i64 = 0; while (i < 3) { i = i + 1; } i:

  (local $i i64)
  i64.const 0
  local.set $i
  block
  loop
    local.get $i
    i64.const 3
    i64.lt_s
    i64.eqz
    br_if 1
    local.get $i
    i64.const 1
    i64.add
    local.set $i
    br 0
  end
  end
  local.get $i

Trace the first two iterations and the exit. Write down $i, the stack, and pc at each key point.

Iteration 1: $i = 0. Push 0, push 3, lt_s -> 1 (true), eqz -> 0, br_if 1 -- 0 is not nonzero, don't branch. Continue into body. Push 0, push 1, add -> 1, set $i. Now $i = 1. br 0 -- jump back to loop.

Iteration 2: $i = 1. Push 1, push 3, lt_s -> 1, eqz -> 0, br_if 1 -- don't branch. Body: 1 + 1 = 2, set $i. $i = 2. br 0 -- back to loop.

Iteration 3: $i = 2. Same. $i = 3. br 0.

Exit: $i = 3. Push 3, push 3, lt_s -> 0 (false), eqz -> 1, br_if 1 -- 1 is nonzero, branch! Jump to block target + 1 (past the outer end). Push $i -> 3. Result: 3.

The eqz flip is doing exactly what we predicted in Problem 45. Without it, the loop would exit on the first iteration.

vm-scan-functions

Function calls. We need to pre-scan the WAT for (func $name ... lines to know where each function starts and ends, and what its parameters are.

var vm_fn_names: [32][]const u8 = undefined;
var vm_fn_start: [32]usize = undefined;
var vm_fn_end: [32]usize = undefined;
var vm_fn_params: [32][8][]const u8 = undefined;
var vm_fn_param_count: [32]usize = undefined;
var vm_fn_count: usize = 0;

Write vm_scan_functions() -- scan through vm_lines, find each (func $name, extract the name and parameter names, find the matching ) that closes the function.

fn vm_scan_functions() void {
    vm_fn_count = 0;
    var i: usize = 0;
    while (i < vm_line_count) {
        const line: []const u8 = vm_trim(vm_lines[i]);
        if (starts_with(line, "(func $")) {
            const name: []const u8 = parse_name(line, 0);
            vm_fn_names[vm_fn_count] = name;
            vm_fn_start[vm_fn_count] = i + 1; // first line of body

            // extract param names
            var pc: usize = 0;
            var j: usize = i;
            // params are on the same line or nearby
            // scan this line for (param $name
            var p: usize = 0;
            while (p < line.len) {
                if (p + 7 < line.len and starts_with(line[p..], "(param $")) {
                    const pname: []const u8 = parse_name(line, p);
                    vm_fn_params[vm_fn_count][pc] = pname;
                    pc += 1;
                }
                p += 1;
            }
            vm_fn_param_count[vm_fn_count] = pc;

            // find closing )  -- a line that is just ")"
            j = i + 1;
            while (j < vm_line_count) {
                const ln: []const u8 = vm_trim(vm_lines[j]);
                if (streq(ln, ")")) {
                    break;
                }
                j += 1;
            }
            vm_fn_end[vm_fn_count] = j;
            vm_fn_count += 1;
        }

        // Also handle (func (export "main") -- no $name
        if (starts_with(line, "(func (export")) {
            vm_fn_names[vm_fn_count] = "main";
            vm_fn_start[vm_fn_count] = i + 1;
            vm_fn_param_count[vm_fn_count] = 0;
            var j: usize = i + 1;
            while (j < vm_line_count) {
                if (streq(vm_trim(vm_lines[j]), ")")) { break; }
                j += 1;
            }
            vm_fn_end[vm_fn_count] = j;
            vm_fn_count += 1;
        }

        i += 1;
    }
}
vm-function-calls

Add function calls to vm_run_at. When we hit call $name:

1. Find the function in the table
2. Pop arguments from the stack (in reverse -- last arg was pushed last)
3. Save the current state: PC, locals, block depth, stack pointer
4. Set up new locals with the parameter values
5. Run the function body
6. Restore state
7. Push the return value

We need a call stack:

var vm_call_pc: [32]usize = undefined;
var vm_call_sp: [32]usize = undefined;
var vm_call_ln: [32][64][]const u8 = undefined;
var vm_call_lv: [32][64]i64 = undefined;
var vm_call_lc: [32]usize = undefined;
var vm_call_bd: [32]usize = undefined;
var vm_call_depth: usize = 0;

Add to vm_run_at, before the vm_exec_line fallthrough:

        if (starts_with(line, "call $")) {
            const fname: []const u8 = parse_name(line, 0);
            // find function
            var fi: usize = 0;
            while (fi < vm_fn_count) {
                if (streq(vm_fn_names[fi], fname)) { break; }
                fi += 1;
            }
            if (fi < vm_fn_count) {
                // pop args
                var args: [8]i64 = undefined;
                var ai: usize = vm_fn_param_count[fi];
                while (ai > 0) {
                    ai -= 1;
                    args[ai] = vm_pop();
                }
                // save state
                const cd: usize = vm_call_depth;
                vm_call_pc[cd] = vm_pc + 1;
                vm_call_sp[cd] = vm_sp;
                vm_call_lc[cd] = vm_local_count;
                vm_call_bd[cd] = vm_block_depth;
                for (0..vm_local_count) |li| {
                    vm_call_ln[cd][li] = vm_local_names[li];
                    vm_call_lv[cd][li] = vm_local_values[li];
                }
                vm_call_depth += 1;
                // set up params as locals
                vm_local_count = 0;
                for (0..vm_fn_param_count[fi]) |pi| {
                    vm_set_local(vm_fn_params[fi][pi], args[pi]);
                }
                // run function body
                const result: i64 = vm_run_at(vm_fn_start[fi], vm_fn_end[fi]);
                // restore state
                vm_call_depth -= 1;
                vm_sp = vm_call_sp[cd];
                vm_local_count = vm_call_lc[cd];
                vm_block_depth = vm_call_bd[cd];
                for (0..vm_local_count) |li| {
                    vm_local_names[li] = vm_call_ln[cd][li];
                    vm_local_values[li] = vm_call_lv[cd][li];
                }
                // push result
                vm_push(result);
                vm_pc = vm_call_pc[cd];
                continue;
            }
            vm_pc += 1;
            continue;
        }

This is the same save/restore pattern we used in the interpreter's call_fn. Save everything, set up new locals, run the body, restore, push the result. The VM's function call is structurally identical to the interpreter's -- different data, same idea.

vm-run-wat

Write vm_run_wat(wat) -- the top-level entry point. Split lines, scan for functions, find main, run it.

fn vm_run_wat(wat: []const u8) i64 {
    vm_split(wat);
    vm_scan_functions();
    vm_sp = 0;
    vm_local_count = 0;
    vm_block_depth = 0;
    vm_call_depth = 0;

    // find main
    for (0..vm_fn_count) |i| {
        if (streq(vm_fn_names[i], "main")) {
            return vm_run_at(vm_fn_start[i], vm_fn_end[i]);
        }
    }
    return 0;
}

Test it with hand-written WAT:

    const result: i64 = vm_run_wat(
        \\(func (export "main") (result i64)
        \\  i64.const 3
        \\  i64.const 5
        \\  i64.add
        \\)
    );
    print("vm result: {d}\n", .{result});   // 8

If that prints 8, your VM runs WebAssembly. Not all of WebAssembly -- just the subset our compiler produces. But it's enough.

check-both

The moment of truth. Write check_both(src, expected):

1. Run the interpreter on src -> get the interpreter's answer
2. Run the compiler on src -> get WAT text
3. Run the VM on the WAT -> get the VM's answer
4. Compare both answers to expected

fn check_both(input: []const u8, expected: i64) void {
    // Interpret
    source = input;
    pos = 0;
    num_vars = 0;
    fn_count = 0;
    return_flag = false;
    save_depth = 0;
    skip();
    const interp: i64 = program();

    // Compile
    c_program(input);

    // Run VM
    const compiled: i64 = vm_run_wat(out[0..out_len]);

    if (interp == expected and compiled == expected) {
        print("  ok {d}\n", .{expected});
    } else {
        print("  FAIL interp={d} compiled={d} want={d}\n", .{ interp, compiled, expected });
    }
}
agreement-suite

Run the full agreement test suite:

    print("--- Agreement ---\n", .{});
    check_both("3 + 5", 8);
    check_both("2 + 3 * 4", 14);
    check_both("(2 + 3) * 4", 20);
    check_both("100 - 58", 42);
    check_both("7 * 6", 42);
    check_both("5 > 3", 1);
    check_both("2 > 7", 0);
    check_both("42 == 42", 1);
    check_both("var x: i64 = 42; x", 42);
    check_both("var x: i64 = 5; x = x + 1; x", 6);
    check_both("var a: i64 = 10; var b: i64 = 20; a + b", 30);
    check_both("if (5 > 3) { 10 } else { 20 }", 10);
    check_both("if (2 > 7) { 10 } else { 20 }", 20);
    check_both(
        \\var s: i64 = 0; var i: i64 = 1;
        \\while (i < 6) {
        \\    s = s + i;
        \\    i = i + 1;
        \\}
        \\s
    , 15);
    check_both("fn add(a: i64, b: i64) i64 { return a + b; } add(3, 5)", 8);
    check_both("fn double(x: i64) i64 { return x * 2; } double(21)", 42);
    check_both(
        \\fn fib(n: i64) i64 {
        \\    var a: i64 = 0;
        \\    var b: i64 = 1;
        \\    var i: i64 = 0;
        \\    while (i < n) {
        \\        var t: i64 = a + b;
        \\        a = b;
        \\        b = t;
        \\        i = i + 1;
        \\    }
        \\    return b;
        \\}
        \\fib(10)
    , 89);

Eighteen tests. Arithmetic, precedence, parentheses, comparisons, variables, assignment, multiple variables, if/else both branches, while loops, function calls, and Fibonacci.

If every line says ok: two completely independent engines -- the interpreter computing directly from source, and the compiler emitting WAT that a VM executes -- agree on every answer. The compiler is correct.

This is not a proof in the mathematical sense. It's a proof in the engineering sense -- the same sense a test suite is a proof. You've tested every feature, every branch of the compiler, every instruction the VM handles. The surface area is fully covered. If you added a feature (say, modulo) and only the compiler handled it wrong, one of these tests would catch it.

bug-vm-while

You compile a program and get ok for everything except while loops -- they all return 0. You check the WAT output and it looks correct. What's likely wrong in the VM?

The most common bug is getting br_if backwards -- forgetting i64.eqz in the compiler, or mishandling the branch direction in the VM. If br_if 1 always triggers (because eqz was omitted), the loop exits immediately. Another common bug: br 0 jumping to the block instead of the loop -- which exits instead of repeating.

Set a breakpoint in the br_if handler. Check: is the popped value 0 or 1? Does it branch or fall through? Trace one iteration manually. The bug is almost always in the interaction between eqz, br_if, and the block/loop distinction.

design-why-vm

Why build a VM instead of using wat2wasm and Node.js?

Three reasons.

First, speed of testing. The agreement test runs in-process -- compile, execute, compare, all in one zig build-exe && ./lang. No spawning processes, no writing temp files, no reading stdout. A hundred tests run in milliseconds.

Second, understanding. Writing the VM forces you to understand what every WAT instruction does. You can't fake i64.eqz -- you have to implement it. If you got the compiler wrong, the VM often fails in a way that reveals exactly what went wrong. The VM is a debugging tool as much as a verification tool.

Third, self-hosting. Eventually we'll write the compiler in its own language. The compiled output will be WAT. To verify it, we'll need to run it. If the VM is part of our Zig host, we can run the self-hosted compiler's output directly. No external tools in the loop.

What we have now: a complete pipeline. Source text goes into the interpreter and the compiler. The interpreter produces a number. The compiler produces WAT. The VM executes the WAT and produces a number. The numbers match. For fib(10), both say 89.

The interpreter tells you what the program means. The compiler encodes that meaning as instructions. The VM follows those instructions. Agreement -- across all inputs, across all features -- is the proof that the encoding is faithful.

Next up: strings, output buffers, and reading stdin. The features the compiler needs to compile itself.

## Part 8: Feeding the Snake

We have a compiler. It reads source text and emits WAT. We have a VM that runs the WAT and gets the same answers as the interpreter. Everything agrees. So what's left?

The compiler is written in Zig. We want it written in our language. For that, we need to look at what the compiler does and ask: can our language do that?

The compiler does three things:
1. Reads bytes from source text: source[pos], cur(), skip()
2. Compares strings to recognize keywords: streq(word, "var")
3. Writes bytes to an output buffer: emit_byte('('), emit_str("i64.const ")

Our language has integers, variables, if/else, while, functions. It does not have byte-level memory access or string literals. Without those, the compiler can't read its input or write its output.

Time to add them.

### Memory

WebAssembly has linear memory -- a flat byte array that programs can read from and write to. Our language will have the same thing: a big array of bytes, accessed by address.

We'll add two builtin functions:
- load8(addr) -- read one byte from address addr
- store8(addr, val) -- write byte val to address addr

That's the entire memory model. One byte at a time. It's enough to build string comparison, output buffers, and everything else the compiler needs.

Feeding the Snake

interpreter-heap

Add the heap to the interpreter. A global byte array, 1MB:

var heap: [1 << 20]u8 = [_]u8{0} ** (1 << 20);

Add load8 and store8 as builtin functions. When the interpreter sees load8(addr) or store8(addr, val) in factor() (where function calls are handled), it executes them directly instead of looking for a user-defined function.

In the interpreter's call path (inside factor(), where we handle name(), add builtin checks before searching user-defined functions:

    if (streq(name, "load8")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return @as(i64, heap[@intCast(addr)]);
    }
    if (streq(name, "store8")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ',') { pos += 1; skip(); }
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        heap[@intCast(addr)] = @intCast(val);
        return 0;
    }
    if (streq(name, "load64")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const a: usize = @intCast(addr);
        var val: i64 = 0;
        for (0..8) |i| { val = val | (@as(i64, heap[a + i]) << @intCast(i * 8)); }
        return val;
    }
    if (streq(name, "store64")) {
        pos += 1; skip();
        const addr: i64 = expression();
        if (cur() == ',') { pos += 1; skip(); }
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const a: usize = @intCast(addr);
        const v: u64 = @bitCast(val);
        for (0..8) |i| { heap[a + i] = @intCast((v >> @intCast(i * 8)) & 0xff); }
        return 0;
    }

Test:

    check("store8(1000, 65); load8(1000)", 65);
    check("store64(2000, 123456); load64(2000)", 123456);
    check("store64(2000, 0 - 1); load64(2000)", -1);

load8/store8 handle single bytes. load64/store64 handle full 64-bit integers -- 8 bytes, little-endian, just like WebAssembly's i64.load/i64.store. We need both: bytes for character data, 64-bit words for arrays of integers. The self-hosted compiler's function table, parameter lists, and string table all store i64 values.

compiler-memory-ops

Add load8, store8, load64, store64 to the compiler. In c_factor(), where we handle function calls, check for builtins before emitting call $name.

For load8(addr): compile the address, emit i32.wrap_i64, i32.load8_u, i64.extend_i32_u.
For store8(addr, val): compile both, emit the store sequence with a $__tmp local to swap order.
For load64(addr): compile the address, emit i32.wrap_i64, i64.load.
For store64(addr, val): compile both, emit the 64-bit store sequence.

    if (streq(name, "load8")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("i32.wrap_i64");
        emit_op("i32.load8_u");
        emit_op("i64.extend_i32_u");
        return;
    }
    if (streq(name, "store8")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ',') { pos += 1; skip(); }
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        // stack: [addr, val]
        // store8 needs: addr (i32), val (i32)
        emit_str("  i32.wrap_i64\n");  // convert val to i32
        emit_str("  local.set $__tmp\n");
        emit_str("  i32.wrap_i64\n");  // convert addr to i32
        emit_str("  local.get $__tmp\n");
        emit_op("i32.store8");
        emit_const(0); // store8 returns 0
        return;
    }
    if (streq(name, "load64")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("i32.wrap_i64");
        emit_op("i64.load");
        return;
    }
    if (streq(name, "store64")) {
        pos += 1; skip();
        c_expression();
        if (cur() == ',') { pos += 1; skip(); }
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_str("  local.set $__tmp\n");
        emit_str("  i32.wrap_i64\n");
        emit_str("  local.get $__tmp\n");
        emit_op("i64.store");
        emit_const(0);
        return;
    }

We need a scratch local $__tmp for the store operations -- WAT's store instructions expect the address under the value on the stack, but our compiler pushes them in argument order (address first, value second). The temp local lets us swap them. Add (local $__tmp i64) to every function in c_program() -- it's one wasted local but saves us from tracking which functions use store.

This is a little ugly. Real compilers solve this more elegantly. But it works, and working is what matters right now.

vm-memory-ops

Add load8 and store8 to the VM. When vm_exec_line sees these instructions, operate on a VM-side memory array:

var vm_memory: [1 << 20]u8 = [_]u8{0} ** (1 << 20);
    if (streq(line, "i32.load8_u")) {
        const addr: i64 = vm_pop();
        vm_push(@as(i64, vm_memory[@intCast(addr)]));
        return;
    }
    if (streq(line, "i32.store8")) {
        const val: i64 = vm_pop();
        const addr: i64 = vm_pop();
        vm_memory[@intCast(addr)] = @intCast(val);
        return;
    }
    if (streq(line, "i64.load")) {
        const addr: usize = @intCast(vm_pop());
        var val: i64 = 0;
        for (0..8) |i| { val = val | (@as(i64, vm_memory[addr + i]) << @intCast(i * 8)); }
        vm_push(val);
        return;
    }
    if (streq(line, "i64.store")) {
        const val: i64 = vm_pop();
        const addr: usize = @intCast(vm_pop());
        const v: u64 = @bitCast(val);
        for (0..8) |i| { vm_memory[addr + i] = @intCast((v >> @intCast(i * 8)) & 0xff); }
        return;
    }
    if (streq(line, "drop")) {
        _ = vm_pop();
        return;
    }
    if (streq(line, "i32.wrap_i64")) {
        // truncate to 32 bits -- for our purposes, a no-op on small values
        return;
    }
    if (streq(line, "i64.extend_i32_u")) {
        // extend to 64 bits -- also a no-op for us
        return;
    }

Test agreement:

    check_both("store8(1000, 65); load8(1000)", 65);
    check_both("store8(100, 72); store8(101, 105); load8(100) * 256 + load8(101)", 72 * 256 + 105);

Both engines agree. Memory works in the interpreter, the compiler, and the VM.

### Strings

Here's the plan. A string literal like "hello" does two things:
1. Stores the bytes h, e, l, l, o somewhere in memory
2. Evaluates to two values: the address where the bytes start, and the length

We'll use a convention: string data lives at high addresses (starting at 60000), and a string expression evaluates to (addr * 65536 + len) -- the address and length packed into a single i64. We can extract them: addr = val / 65536, len = val % 65536.

(In a real language you'd use two values or a struct. Packing them into one i64 is a hack, but it keeps our language simple -- we don't need tuples or multiple return values.)

string-literals

Add string literals to the interpreter's parser. In factor(), when we see ", read characters until the closing ", store them on the heap starting at address 60000, and return the packed (addr, len) value.

var string_offset: usize = 60000;

In factor():

    if (cur() == '"') {
        pos += 1; // skip opening quote
        const addr: usize = string_offset;
        while (cur() != '"' and cur() != 0) {
            heap[string_offset] = cur();
            string_offset += 1;
            pos += 1;
        }
        if (cur() == '"') { pos += 1; skip(); }
        const len: usize = string_offset - addr;
        return @as(i64, addr) * 65536 + @as(i64, len);
    }

Test:

    check("\"hello\" / 65536", 60000);          // addr = 60000
    check("\"hello\" - \"hello\" / 65536 * 65536", 5);   // len = 5

OK, that's not pretty. Let's add some helpers to make strings usable.

string-builtins

Add str_addr(s) and str_len(s) as builtins that extract the address and length from a packed string value. Add print_str(s) that writes the bytes to stderr via Zig's print.

    if (streq(name, "str_len")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return @rem(val, 65536);
    }
    if (streq(name, "str_addr")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        return @divTrunc(val, 65536);
    }
    if (streq(name, "print_str")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const addr: usize = @intCast(@divTrunc(val, 65536));
        const len: usize = @intCast(@rem(val, 65536));
        print("{s}", .{heap[addr..addr + len]});
        return 0;
    }

Now:

    check("str_len(\"hello\")", 5);
    check("str_len(\"\")", 0);
    check("str_addr(\"hello\")", 60000);
streq-in-language

The crucial one: write a streq_mem function in our language that compares two packed string values byte by byte. This is the equivalent of the streq we wrote in Zig -- but running in our language.

    check(
        \\fn streq_mem(a: i64, b: i64) i64 {
        \\    var a_addr: i64 = a / 65536;
        \\    var a_len: i64 = a - a_addr * 65536;
        \\    var b_addr: i64 = b / 65536;
        \\    var b_len: i64 = b - b_addr * 65536;
        \\    if (a_len != b_len) { return 0; }
        \\    var i: i64 = 0;
        \\    while (i < a_len) {
        \\        if (load8(a_addr + i) != load8(b_addr + i)) { return 0; }
        \\        i = i + 1;
        \\    }
        \\    return 1;
        \\}
        \\streq_mem("cat", "cat")
    , 1);

    check(
        \\fn streq_mem(a: i64, b: i64) i64 {
        \\    var a_addr: i64 = a / 65536;
        \\    var a_len: i64 = a - a_addr * 65536;
        \\    var b_addr: i64 = b / 65536;
        \\    var b_len: i64 = b - b_addr * 65536;
        \\    if (a_len != b_len) { return 0; }
        \\    var i: i64 = 0;
        \\    while (i < a_len) {
        \\        if (load8(a_addr + i) != load8(b_addr + i)) { return 0; }
        \\        i = i + 1;
        \\    }
        \\    return 1;
        \\}
        \\streq_mem("cat", "dog")
    , 0);

Stop and look at that. We just wrote string comparison in our own language. load8 reads bytes from memory. The while loop walks both strings. != checks each pair. This is the same logic as our Zig streq, expressed in the language we built. The language is starting to be useful for real work.

compiler-strings

Add string literals to the compiler. When c_factor() sees ", it needs to:

1. Read the string bytes
2. Store them in a string table (a compile-time data structure)
3. Emit i64.const with the packed (addr, len) value

The string table will later become WAT (data ...) sections.

var str_table: [256][]const u8 = undefined;
var str_addrs: [256]usize = undefined;
var str_count: usize = 0;
var str_next_addr: usize = 60000;

In c_factor():

    if (cur() == '"') {
        pos += 1;
        const start: usize = pos;
        while (cur() != '"' and cur() != 0) { pos += 1; }
        const text: []const u8 = source[start..pos];
        if (cur() == '"') { pos += 1; skip(); }

        // Add to string table
        const addr: usize = str_next_addr;
        str_table[str_count] = text;
        str_addrs[str_count] = addr;
        str_count += 1;
        str_next_addr += text.len;

        const packed: i64 = @as(i64, @intCast(addr)) * 65536 + @as(i64, @intCast(text.len));
        emit_const(packed);
        return;
    }
string-data-sections

Emit string data sections at the start of c_program(). After all function definitions and before the main function, emit:

(memory (export "memory") 1)
(data (i32.const 60000) "hello")
(data (i32.const 60005) "world")

Each string gets a (data ...) section that writes its bytes into linear memory at the right address.

fn emit_string_table() void {
    emit_str("(memory (export \"memory\") 1)\n");
    for (0..str_count) |i| {
        emit_str("(data (i32.const ");
        emit_num(@intCast(str_addrs[i]));
        emit_str(") \"");
        // emit bytes, escaping non-printable chars
        for (str_table[i]) |c| {
            if (c >= 32 and c < 127 and c != '"' and c != '\\') {
                emit_byte(c);
            } else {
                emit_byte('\\');
                emit_byte("0123456789abcdef"[c >> 4]);
                emit_byte("0123456789abcdef"[c & 0xf]);
            }
        }
        emit_str("\")\n");
    }
}

Call emit_string_table() from c_program() after emitting function definitions.

compiler-string-builtins

Add string builtins to the compiler. str_len, str_addr, and print_str need WAT implementations.

str_len(s) compiles to: s, then i64.const 65536, then i64.rem_s.
str_addr(s) compiles to: s, then i64.const 65536, then i64.div_s.

For print_str, we'll import a host function from the runtime:

(import "env" "print_str" (func $print_str (param i32) (param i32)))

The import goes at the top of the module. The compiler decomposes the packed string into (addr, len) and calls the imported function.

vm-data-sections

Update the VM to handle data sections and the print_str import.

When the VM sees (data (i32.const ADDR) "..."), it copies the string bytes into vm_memory at the given address.

When the VM hits call $print_str, it pops addr and len, reads bytes from vm_memory, and prints them.

fn vm_load_data() void {
    for (0..vm_line_count) |i| {
        const line: []const u8 = vm_trim(vm_lines[i]);
        if (starts_with(line, "(data (i32.const ")) {
            const addr_start: usize = 17; // after "(data (i32.const "
            const addr: usize = @as(usize, @intCast(parse_int(line, addr_start)));
            // find the string content between quotes
            var q1: usize = 0;
            while (q1 < line.len and line[q1] != '"') { q1 += 1; }
            q1 += 1; // skip first quote
            var q2: usize = q1;
            while (q2 < line.len and line[q2] != '"') {
                if (line[q2] == '\\' and q2 + 2 < line.len) {
                    // hex escape
                    const hi: u8 = hex_val(line[q2 + 1]);
                    const lo: u8 = hex_val(line[q2 + 2]);
                    vm_memory[addr + (q2 - q1)] = hi * 16 + lo;
                    q2 += 3;
                } else {
                    vm_memory[addr + (q2 - q1)] = line[q2];
                    q2 += 1;
                }
            }
        }
    }
}

fn hex_val(c: u8) u8 {
    if (c >= '0' and c <= '9') { return c - '0'; }
    if (c >= 'a' and c <= 'f') { return c - 'a' + 10; }
    return 0;
}

Call vm_load_data() from vm_run_wat() after vm_scan_functions().

agreement-strings

Agreement test for strings:

    check_both("str_len(\"hello\")", 5);
    check_both("str_len(\"\")", 0);
    check_both("store8(100, 65); load8(100)", 65);

And the big one -- streq_mem running through both engines:

    check_both(
        \\fn streq_mem(a: i64, b: i64) i64 {
        \\    var a_addr: i64 = a / 65536;
        \\    var a_len: i64 = a - a_addr * 65536;
        \\    var b_addr: i64 = b / 65536;
        \\    var b_len: i64 = b - b_addr * 65536;
        \\    if (a_len != b_len) { return 0; }
        \\    var i: i64 = 0;
        \\    while (i < a_len) {
        \\        if (load8(a_addr + i) != load8(b_addr + i)) { return 0; }
        \\        i = i + 1;
        \\    }
        \\    return 1;
        \\}
        \\streq_mem("hello", "hello")
    , 1);

If this passes -- a string comparison function, written in our language, compiled to WAT, executed by the VM, producing the same result as the interpreter -- then the language is powerful enough to do real string processing. It can compare keywords. It can check operators. It can parse.

### The taste of self-hosting

Let's write a tiny compiler in our language. Not the full thing -- just enough to compile 3 + 5 into WAT. A proof that the snake's mouth is reaching its tail.

emit-wat-in-language

Write a program in our language that outputs WAT for a hardcoded expression:

    check(
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_len, b);
        \\    out_len = out_len + 1;
        \\    return 0;
        \\}
        \\fn emit_s(s: i64) i64 {
        \\    var addr: i64 = s / 65536;
        \\    var len: i64 = s - addr * 65536;
        \\    var i: i64 = 0;
        \\    while (i < len) {
        \\        emit_byte(load8(addr + i));
        \\        i = i + 1;
        \\    }
        \\    return 0;
        \\}
        \\var out_len: i64 = 0;
        \\emit_s("  i64.const 3\n");
        \\emit_s("  i64.const 5\n");
        \\emit_s("  i64.add\n");
        \\out_len
    , 39);

That program writes WAT instructions into memory starting at address 50000. The emit_byte and emit_s functions are the compiler's output mechanism -- the same emit_byte and emit_str we wrote in Zig, now expressed in our own language.

The test checks that 39 bytes were written (three lines of WAT). To verify the content, we could read back the bytes with load8 -- or we could trust the string tests and move on.

c-number-in-language

Now the real thing. Write a program in our language that:
1. Takes a character from memory (simulating input)
2. Recognizes a digit
3. Emits i64.const N

This is c_number() -- in our language:

    check(
        \\var src_pos: i64 = 0;
        \\var out_len: i64 = 0;
        \\fn cur() i64 { return load8(src_pos); }
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_len, b);
        \\    out_len = out_len + 1;
        \\    return 0;
        \\}
        \\fn emit_s(s: i64) i64 {
        \\    var a: i64 = s / 65536;
        \\    var l: i64 = s - a * 65536;
        \\    var i: i64 = 0;
        \\    while (i < l) { emit_byte(load8(a + i)); i = i + 1; }
        \\    return 0;
        \\}
        \\fn emit_digit(d: i64) i64 {
        \\    emit_byte(48 + d);
        \\    return 0;
        \\}
        \\fn c_number() i64 {
        \\    var val: i64 = 0;
        \\    while (cur() >= 48) {
        \\        if (cur() > 57) { return val; }
        \\        val = val * 10 + (cur() - 48);
        \\        src_pos = src_pos + 1;
        \\    }
        \\    emit_s("  i64.const ");
        \\    emit_digit(val);
        \\    emit_byte(10);
        \\    return val;
        \\}
        \\store8(0, 55);
        \\c_number()
    , 7);

We stored 55 (the ASCII code for '7') at address 0, then called c_number(). It read the digit, emitted i64.const 7\n to the output buffer at address 50000, and returned 7.

That's the compiler's number parser, running in our language, reading from memory, writing to memory. It works.

design-whats-missing

What's still missing for full self-hosting?

Three things:

1. Multi-digit emit_num -- our emit_digit only handles single digits. We need a recursive or loop-based version that prints arbitrary numbers as decimal. This is the emit_num from Problem 30, rewritten in our language.

2. Stdin reading -- the self-hosted compiler needs to read source code from stdin into memory. In WAT, this would be an imported function: (import "env" "read_stdin" (func $read_stdin (result i32))) that fills a memory buffer.

3. The rest of the parser -- c_factor, c_term, c_expression, c_stmt, all the keyword comparisons using streq_mem. Each is a direct translation from Zig to our language. The structure is identical; only the syntax changes.

None of these require new language features. Everything the compiler does -- character access, string comparison, output buffering, recursive descent parsing -- is expressible with what we already have: integers, memory read/write, if/else, while, and functions.

The language is big enough. The snake can reach its tail.

emit-num-in-language

Write emit_num in our language -- a function that prints an arbitrary i64 as decimal digits into the output buffer:

    check(
        \\var out_len: i64 = 0;
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_len, b);
        \\    out_len = out_len + 1;
        \\    return 0;
        \\}
        \\fn emit_num(n: i64) i64 {
        \\    if (n < 0) {
        \\        emit_byte(45);
        \\        emit_num(0 - n);
        \\        return 0;
        \\    }
        \\    if (n > 9) {
        \\        emit_num(n / 10);
        \\    }
        \\    emit_byte(48 + n - n / 10 * 10);
        \\    return 0;
        \\}
        \\emit_num(12345);
        \\out_len
    , 5);

Five bytes written: '1', '2', '3', '4', '5'. The function is recursive -- it divides by 10 to get the higher digits first, then emits the remainder. n - n / 10 * 10 is modulo without the % operator (which our language doesn't have, keeping things simple for the self-hosted compiler to handle).

That's the last piece of the emit puzzle. emit_byte, emit_s, and emit_num -- in our language -- can write any WAT instruction the compiler produces.

We've crossed a threshold. The language now has everything the compiler needs:

| Compiler needs | Language feature |
|---------------|-----------------|
| Read source bytes | load8(addr) |
| Check character class | cur() >= 48 (is digit?) |
| Compare keywords | streq_mem(word, "var") |
| Write WAT output | emit_byte, emit_s, emit_num |
| Recursive descent | Functions with return |
| Track position | var pos: i64 = 0; |
| Skip/scan blocks | while with brace counting |

Every function in our Zig compiler -- c_factor, c_term, c_expression, c_stmt, c_program -- uses only these features. The translation from Zig to our language is mechanical: change the syntax, replace character literals with ASCII codes, use load8/store8 instead of array indexing.

The self-hosted compiler is within reach. It's a big program -- several hundred lines in our language -- but it's not a fundamentally different program. It's the same compiler we already wrote, in a different language. The one we built.

Global Variables

bug-globals-revealed

Run this test:

    check("var x: i64 = 42; fn get() i64 { return x; } get()", 42);

It prints FAIL got=0 want=42. The function get() can't see x.

Why? Look at call_fn, line by line. When we call get():

1. Save all variables: save_vn[0][0] = "x", save_vv[0][0] = 42
2. Set num_vars = 0 -- wipe the entire variable table
3. Bind parameters (none for get)
4. Execute body: return x; -> getVar("x") -> no variables! Returns 0.
5. Restore: put x = 42 back

The function can't see x because call_fn hides it. Every function starts with a blank slate -- it can only see its own parameters and local variables. That was fine for fib(10) where everything is passed as parameters. But the self-hosted compiler has global state -- src_pos, out_pos, src_len -- shared across dozens of functions. Without globals, the compiler is impossible.

We need two kinds of variables: locals (saved and restored on each function call, like now) and globals (visible everywhere, never saved or restored).

global-storage

Add global variable storage to the interpreter:

var global_names: [256][]const u8 = undefined;
var global_values: [256]i64 = undefined;
var num_globals: usize = 0;
var in_function: bool = false;

Write setGlobal(name, val) and getGlobal(name):

fn setGlobal(name: []const u8, val: i64) void {
    for (0..num_globals) |i| {
        if (streq(global_names[i], name)) {
            global_values[i] = val;
            return;
        }
    }
    global_names[num_globals] = name;
    global_values[num_globals] = val;
    num_globals += 1;
}

fn getGlobal(name: []const u8) i64 {
    for (0..num_globals) |i| {
        if (streq(global_names[i], name)) {
            return global_values[i];
        }
    }
    return 0;
}
getvar-checks-globals

Update getVar to check locals first, then globals:

fn getVar(name: []const u8) i64 {
    // Check locals first (innermost scope wins)
    var i: usize = num_vars;
    while (i > 0) {
        i -= 1;
        if (streq(var_names[i], name)) {
            return var_values[i];
        }
    }
    // Then globals
    return getGlobal(name);
}

Update setVar the same way -- check locals, then globals, then create a new local:

fn setVar(name: []const u8, val: i64) void {
    // Check locals first
    for (0..num_vars) |i| {
        if (streq(var_names[i], name)) {
            var_values[i] = val;
            return;
        }
    }
    // Check globals
    for (0..num_globals) |i| {
        if (streq(global_names[i], name)) {
            global_values[i] = val;
            return;
        }
    }
    // Create new local
    var_names[num_vars] = name;
    var_values[num_vars] = val;
    num_vars += 1;
}

Now setVar finds existing variables in either scope. If x is a global, assigning x = 99 inside a function updates the global. If y is a local, it updates the local. If neither exists, it creates a new local.

exec-stmt-globals

Update exec_stmt to create globals at the top level. The var handler:

        if (streq(word, "var") or streq(word, "const")) {
            const vname: []const u8 = read_name();
            if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
            if (cur() == '=') { pos += 1; skip(); }
            const val: i64 = expression();
            if (cur() == ';') { pos += 1; skip(); }
            if (!in_function) {
                setGlobal(vname, val);
            } else {
                setVar(vname, val);
            }
            return val;
        }

And set in_function = true in call_fn, restore it after:

fn call_fn(name: []const u8, args: *const [8]i64, ac: usize) i64 {
    _ = ac;
    // ... find function ...
    // Save caller's LOCAL state only
    const d: usize = save_depth;
    save_nv[d] = num_vars;
    const saved_pos: usize = pos;
    const was_in_function: bool = in_function;
    for (0..num_vars) |i| {
        save_vn[d][i] = var_names[i];
        save_vv[d][i] = var_values[i];
    }
    save_depth += 1;
    num_vars = 0;
    in_function = true;
    // Bind parameters as locals
    for (0..fn_pcounts[fi]) |i| {
        setVar(fn_params[fi][i], args[i]);
    }
    // Execute
    pos = fn_starts[fi];
    return_flag = false;
    const result: i64 = exec_block();
    // Restore locals only -- globals untouched
    save_depth -= 1;
    pos = saved_pos;
    num_vars = save_nv[d];
    in_function = was_in_function;
    for (0..num_vars) |i| {
        var_names[i] = save_vn[d][i];
        var_values[i] = save_vv[d][i];
    }
    return_flag = false;
    return result;
}

The key line: globals are never saved or restored. They survive function calls. Locals are saved on entry and restored on exit, exactly as before.

check-resets-globals

Update check to reset globals:

fn check(input: []const u8, expected: i64) void {
    source = input;
    pos = 0;
    num_vars = 0;
    num_globals = 0;   // ← add this
    fn_count = 0;
    return_flag = false;
    save_depth = 0;
    in_function = false;  // ← add this
    skip();
    const got: i64 = program();
    // ...
}

Now test:

    check("var x: i64 = 42; fn get() i64 { return x; } get()", 42);

ok 42. The function sees the global.

test-global-mutation

Test global mutation:

    check("var x: i64 = 0; fn inc() i64 { x = x + 1; return x; } inc(); inc(); inc()", 3);

Each call to inc() reads x, adds 1, stores it back -- in the global table. After three calls, x is 3.

    check(
        \\var counter: i64 = 0;
        \\fn bump() i64 {
        \\    counter = counter + 1;
        \\    return counter;
        \\}
        \\bump();
        \\bump();
        \\bump();
        \\counter
    , 3);

This is the pattern the self-hosted compiler uses: src_pos, out_pos, and src_len are globals, modified by scanner and emitter functions, visible everywhere.

test-emit-num-globals

Test that emit_num recursion works with globals. This was broken before -- the recursive call would save and restore out_pos, undoing the inner call's writes.

    check(
        \\var out_pos: i64 = 0;
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_pos, b);
        \\    out_pos = out_pos + 1;
        \\    return 0;
        \\}
        \\fn emit_num(n: i64) i64 {
        \\    if (n > 9) { emit_num(n / 10); }
        \\    emit_byte(48 + n - n / 10 * 10);
        \\    return 0;
        \\}
        \\emit_num(12345);
        \\out_pos
    , 5);

Five bytes written. emit_num(12345) recurses four levels deep. Each level calls emit_byte which increments out_pos. Because out_pos is a global, the increment persists through all the recursive calls.

With the old (broken) scoping, each call_fn would save out_pos and restore it on return. The inner calls would write bytes, but out_pos would reset to 0 after each return. Total bytes written: 1 (only the last digit). Now it's 5. That's the fix.

compiler-globals

Add globals to the compiler. We need:

1. A list of global variable names (detected during the first pass)
2. (global $name (mut i64) (i64.const 0)) declarations in the output
3. global.get $name / global.set $name instead of local.get / local.set

Add a global tracking table:

var c_global_names: [64][]const u8 = undefined;
var c_global_count: usize = 0;

fn c_is_global(name: []const u8) bool {
    for (0..c_global_count) |i| {
        if (streq(c_global_names[i], name)) {
            return true;
        }
    }
    return false;
}
compiler-scan-globals

Scan for globals in c_program(). During the first pass (which finds function definitions), also collect top-level var declarations:

At the top of c_program(), before scanning for functions:

    // Scan for top-level globals
    c_global_count = 0;
    source = input;
    pos = 0;
    skip();
    var brace_depth: i32 = 0;
    while (pos < source.len) {
        if (source[pos] == '{') { brace_depth += 1; pos += 1; continue; }
        if (source[pos] == '}') { brace_depth -= 1; pos += 1; continue; }
        if (brace_depth == 0 and is_letter(cur())) {
            const s: usize = pos;
            const w: []const u8 = read_name();
            if (streq(w, "var") or streq(w, "const")) {
                const vn: []const u8 = read_name();
                if (cur() == ':') { pos += 1; skip(); _ = read_name(); }
                c_global_names[c_global_count] = vn;
                c_global_count += 1;
            } else if (streq(w, "fn")) {
                // skip entire fn (will be handled in function pass)
                while (cur() != '{' and cur() != 0) { pos += 1; }
                if (cur() == '{') { skip_block(); }
            } else {
                pos = s;
                pos += 1;
            }
        } else {
            pos += 1;
        }
    }

Then emit the global declarations:

    for (0..c_global_count) |i| {
        emit_str("(global $");
        emit_str(c_global_names[i]);
        emit_str(" (mut i64) (i64.const 0))\n");
    }
compiler-global-get-set

Update c_factor() to use global.get for globals:

Where we emit local.get $name, change to:

    if (c_is_global(name)) {
        emit_str("  global.get $");
    } else {
        emit_str("  local.get $");
    }
    emit_str(name);
    emit_byte('\n');

Update c_stmt() for var declarations and assignment -- use global.set for globals:

    // In the var handler:
    c_expression();
    if (c_is_global(vname)) {
        emit_str("  global.set $");
    } else {
        emit_str("  local.set $");
    }
    emit_str(vname);
    emit_byte('\n');

    // In the assignment handler:
    c_expression();
    if (c_is_global(word)) {
        emit_str("  global.set $");
    } else {
        emit_str("  local.set $");
    }
    emit_str(word);
    emit_byte('\n');

Also update scan_for_locals to skip globals -- they shouldn't appear as (local ...) declarations:

    if (!dup and !c_is_global(vname)) {
        local_names[local_count] = vname;
        local_count += 1;
    }
vm-globals

Add globals to the VM:

var vm_global_names: [64][]const u8 = undefined;
var vm_global_values: [64]i64 = undefined;
var vm_global_count: usize = 0;

fn vm_get_global(name: []const u8) i64 {
    for (0..vm_global_count) |i| {
        if (streq(vm_global_names[i], name)) {
            return vm_global_values[i];
        }
    }
    return 0;
}

fn vm_set_global(name: []const u8, val: i64) void {
    for (0..vm_global_count) |i| {
        if (streq(vm_global_names[i], name)) {
            vm_global_values[i] = val;
            return;
        }
    }
    vm_global_names[vm_global_count] = name;
    vm_global_values[vm_global_count] = val;
    vm_global_count += 1;
}

Add instruction handling in vm_exec_line:

    if (starts_with(line, "global.get $")) {
        vm_push(vm_get_global(parse_name(line, 0)));
        return;
    }
    if (starts_with(line, "global.set $")) {
        vm_set_global(parse_name(line, 0), vm_pop());
        return;
    }

And in vm_scan_functions (or a new vm_scan_globals), parse (global $name ...) declarations:

    if (starts_with(line, "(global $")) {
        vm_set_global(parse_name(line, 0), 0);
    }

Critically, globals are NOT saved/restored in the VM's function call handler. The call handler saves vm_local_names/vm_local_values/vm_local_count only. Globals persist.

Reset in vm_run_wat:

    vm_global_count = 0;

agreement-globals

Agreement tests with globals:

    check_both("var x: i64 = 42; fn get() i64 { return x; } get()", 42);
    check_both("var x: i64 = 0; fn inc() i64 { x = x + 1; return x; } inc(); inc(); inc()", 3);
    check_both(
        \\var counter: i64 = 0;
        \\fn bump() i64 {
        \\    counter = counter + 1;
        \\    return counter;
        \\}
        \\bump();
        \\bump();
        \\bump();
        \\counter
    , 3);

Three tests. Two engines. Same answers. Globals work in the interpreter, the compiler, and the VM.

test-compiler-pattern

The real test -- the self-hosted compiler pattern. Multiple globals, multiple functions accessing them:

    check_both(
        \\var src_pos: i64 = 0;
        \\var out_pos: i64 = 0;
        \\fn cur() i64 {
        \\    return load8(src_pos);
        \\}
        \\fn emit_byte(b: i64) i64 {
        \\    store8(50000 + out_pos, b);
        \\    out_pos = out_pos + 1;
        \\    return 0;
        \\}
        \\fn skip() i64 {
        \\    while (cur() == 32) {
        \\        src_pos = src_pos + 1;
        \\    }
        \\    return 0;
        \\}
        \\store8(0, 32);
        \\store8(1, 32);
        \\store8(2, 55);
        \\skip();
        \\emit_byte(cur());
        \\out_pos
    , 1);

This simulates the self-hosted compiler: source bytes at address 0 (two spaces then '7'), skip() advances src_pos past the spaces, cur() reads the '7' (55), emit_byte writes it to the output buffer. All functions share src_pos and out_pos through global variables.

If this passes in both engines: the foundational mechanism of the self-hosted compiler -- shared global state across scanner and emitter functions -- works correctly.

bug-variable-shadowing

This program gives different results in the interpreter vs what you'd expect. Why?

var x: i64 = 10;
fn foo() i64 {
    var x: i64 = 99;
    return x;
}
foo()

It returns 99, not 10. Inside foo(), var x: i64 = 99; creates a local variable named x (because we're inside a function). return x; finds the local x first (locals are checked before globals). The global x is shadowed but not modified.

This is correct behavior -- local variables shadow globals with the same name. It's how most languages work. But it's a trap if you accidentally reuse a name. Our language doesn't warn about shadowing. Something to keep in mind when writing the self-hosted compiler: don't name a local the same as a global.

Globals are the single biggest feature needed for self-hosting. Every function in the self-hosted compiler -- cur(), skip(), number(), read_name(), emit_byte(), emit_s(), emit_num(), every c_* function -- reads or writes global state. Without globals, none of them can communicate. With globals, they all share the same src_pos, out_pos, and src_len, just like the Zig versions share Zig globals.

The pattern is now proven: declare globals at the top level, access them from any function, mutations are visible everywhere. The interpreter, compiler, and VM all agree. We can move on.

## Part 10: Completing the Language

Our language is a subset of Zig. Every program we write should compile with both our compiler and the Zig compiler. This chapter adds the remaining operators and features to close the gap.

### Comparison operators

Completing the Language

interp-ge-le

Add >= and <= to the interpreter's expression(). Check for two-character operators before single-character ones. >= returns 1 if left >= right, else 0.

compiler-ge-le

Add >= and <= to the compiler (i64.ge_s, i64.le_s) and VM. Agreement tests:

    check_both("5 >= 5", 1);
    check_both("5 >= 6", 0);
    check_both("3 <= 3", 1);
    check_both("3 <= 2", 0);

### Logical operators

interp-and-or

Add and and or to the interpreter. These are new precedence levels -- or binds loosest, and binds tighter, both looser than comparisons. Rename expression() to comparison(). Write and_expr() and or_expr(). The new expression() calls or_expr():

expression -> or_expr -> and_expr -> comparison -> add_sub -> term -> factor

and: both sides nonzero -> 1, else 0. or: either side nonzero -> 1, else 0. Not short-circuiting -- both sides always evaluate.

compiler-and-or

Add and/or to the compiler. Normalize each side to 0/1 with i64.const 0 / i64.ne, then use bitwise i64.and / i64.or. Add these to the VM. Agreement tests:

    check_both("1 and 1", 1);
    check_both("1 and 0", 0);
    check_both("0 or 1", 1);
    check_both("5 > 3 and 10 > 7", 1);
    check_both("5 >= 5 and 5 <= 5", 1);

### Character literals

char-literals

Add character literals to the interpreter and compiler. In factor() / c_factor(), when we see a single quote, read the byte inside and return/emit its integer value. 'a' is 97, '0' is 48. Agreement tests:

    check_both("'a'", 97);
    check_both("'0'", 48);
    check_both("'z' - 'a'", 25);

In the self-hosted compiler, detect single quotes with cur() == 39 (since character literals don't exist yet during bootstrap).

### Modulo

modulo-operator

Add % to term() and c_term(), same precedence as * and /. Maps to i64.rem_s in WAT. Add i64.rem_s to the VM.

    check_both("10 % 3", 1);
    check_both("17 % 5", 2);

### Compound assignment

compound-assignment

Add +=, -=, *= to exec_stmt() and c_stmt(). When we see name followed by +=, desugar to name = name + expr.

Interpreter -- after reading a name, before the regular = check:

        if (cur() == '+' and pos + 1 < source.len and source[pos + 1] == '=') {
            pos += 2; skip();
            const val: i64 = expression();
            if (cur() == ';') { pos += 1; skip(); }
            setVar(word, getVar(word) + val);
            return getVar(word);
        }

Compiler -- emit get, compile right side, emit add, emit set:

        if (cur() == '+' and pos + 1 < source.len and source[pos + 1] == '=') {
            pos += 2; skip();
            if (c_is_global(word)) { emit_str("  global.get $"); } else { emit_str("  local.get $"); }
            emit_str(word); emit_byte('\n');
            c_expression();
            emit_op("i64.add");
            if (c_is_global(word)) { emit_str("  global.set $"); } else { emit_str("  local.set $"); }
            emit_str(word); emit_byte('\n');
            if (cur() == ';') { pos += 1; skip(); }
            return false;
        }

Same pattern for -= (with sub) and *= (with mul).

    check_both("var x: i64 = 10; x += 5; x", 15);
    check_both("var x: i64 = 10; x -= 3; x", 7);
    check_both("var x: i64 = 10; x *= 2; x", 20);

191 occurrences of += 1 in the Zig code. After this, every one compiles in both languages.

### Break

break-statement

Add break to the interpreter with a global break_flag (like return_flag). Add to the compiler as emit_op("br 1") -- unconditional exit from the block/loop pair. The VM already handles br 1.

    check_both(
        \\var i: i64 = 0;
        \\while (i < 100) {
        \\    if (i == 5) { break; }
        \\    i += 1;
        \\}
        \\i
    , 5);

### For loops (range only)

We support for (start..end) |name| { body } -- Zig's range-based for loop. This covers all 33 for loops in our code. We do NOT support array iteration (for (array) |item|).

for-loop-interpreter

Add for to the interpreter:

        if (streq(word, "for")) {
            if (cur() == '(') { pos += 1; skip(); }
            const start_val: i64 = expression();
            if (cur() == '.') { pos += 1; }
            if (cur() == '.') { pos += 1; }
            skip();
            const end_val: i64 = expression();
            if (cur() == ')') { pos += 1; skip(); }
            if (cur() == '|') { pos += 1; skip(); }
            const loop_var: []const u8 = read_name();
            if (cur() == '|') { pos += 1; skip(); }

            const body_pos: usize = pos;
            var i: i64 = start_val;
            while (i < end_val) {
                pos = body_pos;
                setVar(loop_var, i);
                _ = exec_block();
                if (return_flag or break_flag) { break; }
                i += 1;
            }
            if (break_flag) { break_flag = false; }
            if (pos <= body_pos) { pos = body_pos; skip_block(); }
            return 0;
        }

The two dots are parsed as two separate . characters. The |name| capture is the pipe characters with a variable name between them. Inside the loop, name is set as a local variable for each iteration.

for-loop-compiler

Add for to the compiler. Emit block/loop/br_if/br with auto-init and auto-increment. Store the end bound in $__for_end:

        if (streq(word, "for")) {
            if (cur() == '(') { pos += 1; skip(); }
            c_expression();          // start
            if (cur() == '.') { pos += 1; }
            if (cur() == '.') { pos += 1; }
            skip();
            c_expression();          // end
            emit_str("  local.set $__for_end\n");
            if (cur() == ')') { pos += 1; skip(); }
            if (cur() == '|') { pos += 1; skip(); }
            const loop_var: []const u8 = read_name();
            if (cur() == '|') { pos += 1; skip(); }
            // init
            emit_str("  local.set $"); emit_str(loop_var); emit_byte('\n');
            emit_op("block");
            emit_op("loop");
            // condition: i >= end -> exit
            emit_str("  local.get $"); emit_str(loop_var); emit_byte('\n');
            emit_str("  local.get $__for_end\n");
            emit_op("i64.ge_s");
            emit_op("br_if 1");
            c_block();
            emit_op("drop");
            // increment
            emit_str("  local.get $"); emit_str(loop_var); emit_byte('\n');
            emit_op("i64.const 1");
            emit_op("i64.add");
            emit_str("  local.set $"); emit_str(loop_var); emit_byte('\n');
            emit_op("br 0");
            emit_op("end");
            emit_op("end");
            return false;
        }

Add (local $__for_end i64) to emit_locals(). Update scan_for_locals to detect |name| in for statements and register the capture as a local.

Agreement tests:

    check_both("var s: i64 = 0; for (0..5) |i| { s += i; } s", 10);
    check_both("var s: i64 = 0; for (0..3) |i| { for (0..4) |j| { s += 1; } } s", 12);

### true and false

true-false-keywords

Add true and false as keywords in factor() and c_factor(). After reading a name, check before treating it as a variable:

    if (streq(name, "true")) { return 1; }
    if (streq(name, "false")) { return 0; }

In the compiler, emit i64.const 1 or i64.const 0. In Zig, these are bool values. In our language, they're the integers 1 and 0. The behavior is identical when used with and, or, and if.

    check_both("true", 1);
    check_both("false", 0);
    check_both("true and false", 0);
    check_both("true or false", 1);

### Arrays

array-declarations

Array declarations. When the var handler sees : [N] in the type position, allocate N x 8 bytes from the heap (starting at address 300000). The variable's value is the base address. Parse and accept = undefined (ignore it -- our heap is zeroed). Accept any type name after ] (not just i64 -- also usize, bool, etc., all treated as i64).

Track which variables are arrays with a parallel flag. Set during declaration, check during indexing.

array-read

Array read. In factor(), when we see name[, check if it's an array variable. If yes: load64(base + idx * 8). If no: packed string byte access load8(addr + idx).

array-write

Array write. In exec_stmt(), when we see name[, parse index, ], =, value. Write with store64(base + idx * 8, val). Also handle arr[i] += val.

compiler-arrays

Add arrays to the compiler. Array globals get pre-assigned addresses from a counter (c_next_array). Track array names. arr[i] read emits: get base, index * 8, add, i32.wrap_i64, i64.load. arr[i] = val write emits the store sequence with $__tmp.

agreement-arrays

Agreement tests:

    check_both(
        \\var arr: [10]i64 = undefined;
        \\arr[0] = 42; arr[1] = 99;
        \\arr[0] + arr[1]
    , 141);
payoff-var-storage

The payoff -- variable storage from the Zig interpreter, now valid in both languages:

var var_names: [256]i64 = undefined;
var var_values: [256]i64 = undefined;
var num_vars: i64 = 0;

fn setVar(name: i64, val: i64) i64 {
    for (0..num_vars) |i| {
        if (streq_mem(var_names[i], name) == 1) {
            var_values[i] = val;
            return 0;
        }
    }
    var_names[num_vars] = name;
    var_values[num_vars] = val;
    num_vars += 1;
    return 0;
}

Same arrays, same for loop, same +=, same indexing. Compiles under both.

design-not-supported

What Zig features do we deliberately NOT support?

Array iteration (for (array) |item|): our arrays don't carry length. Use for (0..count) |i| { arr[i] ... }.

@intCast / @as: type casting. Not needed -- everything is i64.

Pointers: restructure shared code to pass values, not addresses.

Complex initializers ([_]i64{0} ** 256): our arrays zero-initialize via the heap. Accept = undefined.

The rule: add a feature only when avoiding it would change the code's structure, not merely its spelling.

## Part 11: The Compiler, in Its Own Language

We've proven every piece individually. streq_mem compares strings. emit_byte, emit_s, emit_num write output. The scanner reads source bytes. Now we assemble them into a complete compiler -- written in the language we built, using every feature we added in Part 10.

Every function here compiles under both our compiler and the Zig compiler. That's the test: one source, two compilers, same result.

### I/O

The Compiler, in Its Own Language

io-builtins

Add read_input and print_char to the interpreter, compiler, and VM.

In the interpreter: read_input() copies the source into heap at address 0 and returns the length. print_char(c) writes one byte to output.

In the compiler: these become imported WAT functions. Add to c_program():

    emit_str("(import \"env\" \"print_char\" (func $print_char (param i64)))\n");
    emit_str("(import \"env\" \"read_input\" (func $read_input (result i64)))\n");
    emit_str("(memory (export \"memory\") 10)\n");

In the VM: call $print_char pops and prints the byte. call $read_input loads test data into vm_memory and pushes the length.

### The scanner

self-char-classification

Character classification. Now that we have and, or, >=, <=, and character literals, these are clean single-line conditions:

fn is_space(c: i64) i64 {
    if (c == ' ' or c == '\n' or c == '\r' or c == '\t') { return 1; }
    return 0;
}

fn is_letter(c: i64) i64 {
    if (c >= 'a' and c <= 'z') { return 1; }
    if (c >= 'A' and c <= 'Z') { return 1; }
    if (c == '_') { return 1; }
    return 0;
}

fn is_digit(c: i64) i64 {
    if (c >= '0' and c <= '9') { return 1; }
    return 0;
}

fn is_alnum(c: i64) i64 {
    if (is_letter(c) == 1 or is_digit(c) == 1) { return 1; }
    return 0;
}

Compare with the Zig interpreter's versions -- they're nearly identical. is_digit is c >= '0' and c <= '9' in both languages now.

self-scanner

The core scanner:

var src_pos: i64 = 0;
var src_len: i64 = 0;

fn cur() i64 {
    if (src_pos >= src_len) { return 0; }
    return load8(src_pos);
}

fn skip() i64 {
    while (is_space(cur()) == 1) { src_pos += 1; }
    return 0;
}

fn number() i64 {
    var val: i64 = 0;
    while (is_digit(cur()) == 1) {
        val = val * 10 + (cur() - '0');
        src_pos += 1;
    }
    skip();
    return val;
}

fn read_name() i64 {
    var start: i64 = src_pos;
    while (is_alnum(cur()) == 1) { src_pos += 1; }
    var len: i64 = src_pos - start;
    skip();
    return start * 65536 + len;
}

cur() - '0' instead of cur() - 48. src_pos += 1 instead of src_pos = src_pos + 1. The scanner reads like the Zig version.

self-streq-emit

String comparison and emit:

fn streq_mem(a: i64, b: i64) i64 {
    if (len(a) != len(b)) { return 0; }
    for (0..len(a)) |i| {
        if (a[i] != b[i]) { return 0; }
    }
    return 1;
}

var out_pos: i64 = 0;

fn emit_byte(b: i64) i64 {
    store8(100000 + out_pos, b);
    out_pos += 1;
    return 0;
}

fn emit_s(s: i64) i64 {
    for (0..len(s)) |i| {
        emit_byte(s[i]);
    }
    return 0;
}

fn emit_num(n: i64) i64 {
    if (n < 0) { emit_byte('-'); emit_num(0 - n); return 0; }
    if (n > 9) { emit_num(n / 10); }
    emit_byte('0' + n % 10);
    return 0;
}

streq_mem uses a[i] slice indexing, len(a) for length, and for loops. No manual address arithmetic. emit_s iterates with for instead of while with a counter. emit_num uses '0' + n % 10 -- character literal and modulo, both earning their keep.

### The expression compiler

self-c-factor

c_factor -- numbers, variables, parentheses, string literals, character literals, function calls:

fn c_factor() i64 {
    if (cur() == '(') {
        src_pos += 1; skip();
        c_expression();
        if (cur() == ')') { src_pos += 1; skip(); }
        return 0;
    }
    if (cur() == '\'') {
        src_pos += 1;
        var ch: i64 = cur();
        src_pos += 1;
        if (cur() == '\'') { src_pos += 1; skip(); }
        emit_s("  i64.const "); emit_num(ch); emit_byte('\n');
        return 0;
    }
    if (cur() == '"') {
        src_pos += 1;
        var sstart: i64 = src_pos;
        while (cur() != '"') { src_pos += 1; }
        var slen: i64 = src_pos - sstart;
        src_pos += 1; skip();
        emit_s("  i64.const ");
        emit_num(sstart * 65536 + slen);
        emit_byte('\n');
        add_str(sstart, slen);
        return 0;
    }
    if (is_digit(cur()) == 1) {
        emit_s("  i64.const "); emit_num(number()); emit_byte('\n');
        return 0;
    }
    if (is_letter(cur()) == 1) {
        var name: i64 = read_name();
        if (streq_mem(name, "true") == 1) { emit_s("  i64.const 1\n"); return 0; }
        if (streq_mem(name, "false") == 1) { emit_s("  i64.const 0\n"); return 0; }
        if (cur() == '(') {
            c_call(name);
            return 0;
        }
        if (is_global(name) == 1) { emit_s("  global.get $"); }
        else { emit_s("  local.get $"); }
        emit_s(name); emit_byte('\n');
        return 0;
    }
    return 0;
}

Character literals ('(' instead of 40), true/false keywords, += on src_pos. The function handles every kind of atomic expression.

self-c-term-expression

c_term, c_add_sub, c_expression:

fn c_term() i64 {
    c_factor();
    while (cur() == '*' or cur() == '/' or cur() == '%') {
        var op: i64 = cur();
        src_pos += 1; skip();
        c_factor();
        if (op == '*') { emit_s("  i64.mul\n"); }
        if (op == '/') { emit_s("  i64.div_s\n"); }
        if (op == '%') { emit_s("  i64.rem_s\n"); }
    }
    return 0;
}

fn c_add_sub() i64 {
    c_term();
    while (cur() == '+' or cur() == '-') {
        var op: i64 = cur();
        src_pos += 1; skip();
        c_term();
        if (op == '+') { emit_s("  i64.add\n"); }
        if (op == '-') { emit_s("  i64.sub\n"); }
    }
    return 0;
}

fn c_comparison() i64 {
    c_add_sub();
    if (cur() == '>' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.ge_s\n");
    }
    if (cur() == '<' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.le_s\n");
    }
    if (cur() == '>') { src_pos += 1; skip(); c_add_sub(); emit_s("  i64.gt_s\n"); }
    if (cur() == '<') { src_pos += 1; skip(); c_add_sub(); emit_s("  i64.lt_s\n"); }
    if (cur() == '=' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.eq\n");
    }
    if (cur() == '!' and load8(src_pos + 1) == '=') {
        src_pos += 2; skip(); c_add_sub(); emit_s("  i64.ne\n");
    }
    return 0;
}

fn c_and_expr() i64 {
    c_comparison();
    while (is_letter(cur()) == 1) {
        var s: i64 = src_pos;
        var w: i64 = read_name();
        if (streq_mem(w, "and") == 1) {
            emit_s("  i64.const 0\n  i64.ne\n");
            c_comparison();
            emit_s("  i64.const 0\n  i64.ne\n  i64.and\n");
        } else { src_pos = s; break; }
    }
    return 0;
}

fn c_or_expr() i64 {
    c_and_expr();
    while (is_letter(cur()) == 1) {
        var s: i64 = src_pos;
        var w: i64 = read_name();
        if (streq_mem(w, "or") == 1) {
            emit_s("  i64.const 0\n  i64.ne\n");
            c_and_expr();
            emit_s("  i64.const 0\n  i64.ne\n  i64.or\n");
        } else { src_pos = s; break; }
    }
    return 0;
}

fn c_expression() i64 {
    c_or_expr();
    return 0;
}

cur() == '*' or cur() == '/' -- inline condition using or, no helper function needed. The full precedence chain: expression -> or_expr -> and_expr -> comparison -> add_sub -> term -> factor. The self-hosted compiler handles every operator the Zig compiler handles.

### The statement compiler

self-c-stmt

c_stmt -- returns 1 if it left a value on the stack, 0 if not:

fn c_stmt() i64 {
    if (is_letter(cur()) == 1) {
        var save: i64 = src_pos;
        var word: i64 = read_name();

        if (streq_mem(word, "var") == 1 or streq_mem(word, "const") == 1) {
            var vname: i64 = read_name();
            if (cur() == ':') { src_pos += 1; skip(); read_name(); }
            if (cur() == '=') { src_pos += 1; skip(); }
            c_expression();
            if (is_global(vname) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(vname); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        if (streq_mem(word, "return") == 1) {
            c_expression();
            emit_s("  return\n");
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        if (streq_mem(word, "if") == 1) {
            if (cur() == '(') { src_pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { src_pos += 1; skip(); }
            emit_s("  if (result i64)\n");
            c_block();
            if (is_letter(cur()) == 1) {
                var s2: i64 = src_pos;
                var w2: i64 = read_name();
                if (streq_mem(w2, "else") == 1) {
                    emit_s("  else\n");
                    c_block();
                } else {
                    src_pos = s2;
                    emit_s("  else\n  i64.const 0\n");
                }
            } else {
                emit_s("  else\n  i64.const 0\n");
            }
            emit_s("  end\n");
            return 1;
        }

        if (streq_mem(word, "while") == 1) {
            emit_s("  block\n  loop\n");
            if (cur() == '(') { src_pos += 1; skip(); }
            c_expression();
            if (cur() == ')') { src_pos += 1; skip(); }
            emit_s("  i64.eqz\n  br_if 1\n");
            c_block();
            emit_s("  drop\n  br 0\n  end\n  end\n");
            return 0;
        }

        if (streq_mem(word, "for") == 1) {
            if (cur() == '(') { src_pos += 1; skip(); }
            c_expression();
            if (cur() == '.') { src_pos += 1; }
            if (cur() == '.') { src_pos += 1; }
            skip();
            c_expression();
            emit_s("  local.set $__for_end\n");
            if (cur() == ')') { src_pos += 1; skip(); }
            if (cur() == '|') { src_pos += 1; skip(); }
            var loop_var: i64 = read_name();
            if (cur() == '|') { src_pos += 1; skip(); }
            emit_s("  local.set $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  block\n  loop\n");
            emit_s("  local.get $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  local.get $__for_end\n  i64.ge_s\n  br_if 1\n");
            c_block();
            emit_s("  drop\n");
            emit_s("  local.get $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  i64.const 1\n  i64.add\n");
            emit_s("  local.set $"); emit_s(loop_var); emit_byte('\n');
            emit_s("  br 0\n  end\n  end\n");
            return 0;
        }

        if (streq_mem(word, "break") == 1) {
            if (cur() == ';') { src_pos += 1; skip(); }
            emit_s("  br 1\n");
            return 0;
        }

        if (streq_mem(word, "fn") == 1) {
            c_fn();
            return 0;
        }

        // compound assignment: name += expr
        if (cur() == '+' and load8(src_pos + 1) == '=') {
            src_pos += 2; skip();
            if (is_global(word) == 1) { emit_s("  global.get $"); }
            else { emit_s("  local.get $"); }
            emit_s(word); emit_byte('\n');
            c_expression();
            emit_s("  i64.add\n");
            if (is_global(word) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(word); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }
        if (cur() == '-' and load8(src_pos + 1) == '=') {
            src_pos += 2; skip();
            if (is_global(word) == 1) { emit_s("  global.get $"); }
            else { emit_s("  local.get $"); }
            emit_s(word); emit_byte('\n');
            c_expression();
            emit_s("  i64.sub\n");
            if (is_global(word) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(word); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        // assignment: name = expr (not ==)
        if (cur() == '=' and load8(src_pos + 1) != '=') {
            src_pos += 1; skip();
            c_expression();
            if (is_global(word) == 1) { emit_s("  global.set $"); }
            else { emit_s("  local.set $"); }
            emit_s(word); emit_byte('\n');
            if (cur() == ';') { src_pos += 1; skip(); }
            return 0;
        }

        src_pos = save;
    }
    c_expression();
    if (cur() == ';') { src_pos += 1; skip(); }
    return 1;
}

Every keyword check uses streq_mem with string literals. var and const are handled together with or. for parses the (start..end) |name| syntax. break emits br 1. Compound assignment (+=, -=) emits get-compute-set. The = vs == check uses load8(src_pos + 1) != '='.

self-c-block

c_block:

fn c_block() i64 {
    if (cur() == '{') { src_pos += 1; skip(); }
    var last_had_value: i64 = 0;
    while (cur() != '}') {
        if (cur() == 0) { return 0; }
        if (last_had_value == 1) { emit_s("  drop\n"); }
        last_had_value = c_stmt();
    }
    if (last_had_value == 0) { emit_s("  i64.const 0\n"); }
    if (cur() == '}') { src_pos += 1; skip(); }
    return 0;
}
self-c-call-fn

c_call and c_fn:

fn c_call(name: i64) i64 {
    src_pos += 1; skip();
    while (cur() != ')') {
        if (cur() == 0) { return 0; }
        c_expression();
        if (cur() == ',') { src_pos += 1; skip(); }
    }
    if (cur() == ')') { src_pos += 1; skip(); }
    emit_s("  call $"); emit_s(name); emit_byte('\n');
    return 0;
}

fn c_fn() i64 {
    var fname: i64 = read_name();
    emit_s("(func $"); emit_s(fname);

    if (cur() == '(') { src_pos += 1; skip(); }
    while (cur() != ')') {
        if (cur() == 0) { return 0; }
        var pname: i64 = read_name();
        emit_s(" (param $"); emit_s(pname);
        if (cur() == ':') {
            src_pos += 1; skip();
            var ptype: i64 = read_name();
            emit_s(" "); emit_s(ptype);
        }
        emit_s(")");
        if (cur() == ',') { src_pos += 1; skip(); }
    }
    if (cur() == ')') { src_pos += 1; skip(); }

    if (is_letter(cur()) == 1) {
        var rtype: i64 = read_name();
        emit_s(" (result "); emit_s(rtype); emit_s(")");
    }
    emit_byte('\n');

    var body_start: i64 = src_pos;
    scan_locals();
    src_pos = body_start;
    c_block();
    emit_s("  drop\n  i64.const 0\n)\n");
    return 0;
}
self-helpers

Helper functions:

fn skip_past_block() i64 {
    while (cur() != '{') {
        if (cur() == 0) { return 0; }
        src_pos += 1;
    }
    src_pos += 1; skip();
    var depth: i64 = 1;
    while (depth > 0) {
        if (cur() == 0) { return 0; }
        if (cur() == '{') { depth += 1; }
        if (cur() == '}') { depth -= 1; }
        if (depth > 0) { src_pos += 1; }
    }
    if (cur() == '}') { src_pos += 1; skip(); }
    return 0;
}

fn skip_stmt() i64 {
    while (cur() != ';') {
        if (cur() == 0) { return 0; }
        if (cur() == '{') { skip_past_block(); return 0; }
        src_pos += 1;
    }
    src_pos += 1; skip();
    return 0;
}

fn scan_locals() i64 {
    var save: i64 = src_pos;
    if (cur() == '{') { src_pos += 1; skip(); }
    var depth: i64 = 0;
    while (cur() != 0) {
        if (cur() == '{') { depth += 1; }
        if (cur() == '}') {
            if (depth == 0) {
                emit_s("  (local $__tmp i64)\n");
                emit_s("  (local $__for_end i64)\n");
                src_pos = save;
                return 0;
            }
            depth -= 1;
        }
        if (is_letter(cur()) == 1) {
            var w: i64 = read_name();
            if (streq_mem(w, "var") == 1 or streq_mem(w, "const") == 1) {
                var vname: i64 = read_name();
                if (cur() == ':') { src_pos += 1; skip(); read_name(); }
                if (is_global(vname) == 0) {
                    emit_s("  (local $"); emit_s(vname); emit_s(" i64)\n");
                }
            }
            if (streq_mem(w, "for") == 1) {
                // skip to |name| and register the capture variable
                while (cur() != '|' and cur() != 0) { src_pos += 1; }
                if (cur() == '|') { src_pos += 1; skip(); }
                var capname: i64 = read_name();
                if (cur() == '|') { src_pos += 1; skip(); }
                emit_s("  (local $"); emit_s(capname); emit_s(" i64)\n");
            }
        } else {
            src_pos += 1;
        }
    }
    emit_s("  (local $__tmp i64)\n");
    emit_s("  (local $__for_end i64)\n");
    src_pos = save;
    return 0;
}

scan_locals now also detects for loops and registers their capture variables as locals. It emits $__tmp and $__for_end for every function.

self-global-tracking

Global variable tracking using arrays:

var global_names: [64]i64 = undefined;
var g_count: i64 = 0;

fn add_global(name: i64) i64 {
    global_names[g_count] = name;
    g_count += 1;
    return 0;
}

fn is_global(name: i64) i64 {
    for (0..g_count) |i| {
        if (streq_mem(global_names[i], name) == 1) { return 1; }
    }
    return 0;
}

global_names[g_count] = name -- array write. global_names[i] -- array read. for (0..g_count) |i| -- range loop. No manual store64/load64 address arithmetic.

self-string-table

String data table using arrays:

var str_addrs: [256]i64 = undefined;
var str_lens: [256]i64 = undefined;
var str_count: i64 = 0;

fn add_str(addr: i64, slen: i64) i64 {
    str_addrs[str_count] = addr;
    str_lens[str_count] = slen;
    str_count += 1;
    return 0;
}

fn emit_string_data() i64 {
    for (0..str_count) |i| {
        emit_s("(data (i32.const ");
        emit_num(str_addrs[i]);
        emit_s(") \"");
        for (0..str_lens[i]) |j| {
            emit_byte(load8(str_addrs[i] + j));
        }
        emit_s("\")\n");
    }
    return 0;
}

Two parallel arrays instead of packed 16-byte records. str_addrs[i] reads cleanly. Nested for loops iterate the table and the bytes.

self-c-program

c_program -- the top-level compiler:

fn c_program() i64 {
    emit_s("(module\n");
    emit_s("(memory (export \"memory\") 10)\n");

    // scan for top-level globals
    src_pos = 0; skip();
    var depth: i64 = 0;
    while (src_pos < src_len) {
        if (load8(src_pos) == '{') { depth += 1; src_pos += 1; }
        if (load8(src_pos) == '}') { depth -= 1; src_pos += 1; }
        if (depth == 0 and is_letter(cur()) == 1) {
            var s: i64 = src_pos;
            var w: i64 = read_name();
            if (streq_mem(w, "var") == 1 or streq_mem(w, "const") == 1) {
                var vn: i64 = read_name();
                if (cur() == ':') { src_pos += 1; skip(); read_name(); }
                add_global(vn);
            } else {
                if (streq_mem(w, "fn") == 1) { skip_past_block(); }
                else { src_pos = s; src_pos += 1; }
            }
        } else { src_pos += 1; }
    }

    // emit global declarations
    for (0..g_count) |gi| {
        emit_s("(global $"); emit_s(global_names[gi]);
        emit_s(" (mut i64) (i64.const 0))\n");
    }

    // compile function definitions
    src_pos = 0; skip();
    while (cur() != 0) {
        if (is_letter(cur()) == 1) {
            var s: i64 = src_pos;
            var w: i64 = read_name();
            if (streq_mem(w, "fn") == 1) { c_fn(); }
            else { src_pos = s; skip_stmt(); }
        } else { src_pos += 1; }
    }

    // compile top-level code into main
    emit_s("(func (export \"main\") (result i64)\n");
    emit_s("  (local $__tmp i64)\n");
    emit_s("  (local $__for_end i64)\n");
    src_pos = 0; skip();
    var last_v: i64 = 0;
    while (cur() != 0) {
        if (is_letter(cur()) == 1) {
            var s: i64 = src_pos;
            var w: i64 = read_name();
            if (streq_mem(w, "fn") == 1) {
                skip_past_block();
            } else {
                src_pos = s;
                if (last_v == 1) { emit_s("  drop\n"); }
                last_v = c_stmt();
            }
        } else {
            if (last_v == 1) { emit_s("  drop\n"); }
            last_v = c_stmt();
        }
    }
    if (last_v == 0) { emit_s("  i64.const 0\n"); }
    emit_s(")\n");
    emit_string_data();
    emit_s(")\n");
    return 0;
}
self-entry-point

The entry point:

fn main() i64 {
    src_len = read_input();
    c_program();
    for (0..out_pos) |i| {
        print_char(load8(100000 + i));
    }
    return 0;
}
main()

read_input loads source code into memory at address 0. c_program parses it and writes WAT into memory at 100000. The for loop prints every byte to stdout.

self-feature-coverage

Count the features. Every Part 10 addition appears in the self-hosted compiler:

| Feature | Where it appears |
|---------|-----------------|
| >= <= | c_comparison: i64.ge_s, i64.le_s |
| and or | c_and_expr, c_or_expr, is_space, c_term conditions |
| 'a' | c_factor char literal handler, emit_num's '0' |
| % | emit_num: n % 10, c_term: i64.rem_s |
| += -= | c_stmt compound assignment, every counter increment |
| break | c_stmt handler, c_and_expr/c_or_expr backtrack |
| for (0..n) \|i\| | streq_mem, emit_s, emit_string_data, c_program, main |
| true false | c_factor keyword check |
| var arr: [N]i64 | global_names, str_addrs, str_lens |
| arr[i] | global_names[i], str_addrs[i], str_lens[i] |
| s[i] len(s) | streq_mem: a[i] != b[i], len(a) |

The self-hosted compiler is the proof that Part 10 works. Every feature was added because this chapter uses it.

## Part 12: The Snake Eats Its Tail

The Snake Eats Its Tail

assemble-compiler

Assemble the complete compiler. Create a file called compiler.lang containing, in order:

1. All the helper functions (is_space, is_letter, is_digit, is_alnum)
2. All the scanner functions (cur, skip, number, read_name, streq_mem)
3. All the emitter functions (emit_byte, emit_s, emit_num)
4. All the compiler functions (c_factor, c_term, c_add_sub, c_expression, c_stmt, c_block, c_fn, c_call, scan_locals, skip_past_block, skip_stmt, c_program, add_str, emit_string_data, is_global, add_global)
5. The global variables (src_pos, src_len, out_pos)
6. The main function

Every function in this file uses only features our language supports: i64 integers, if/else with braces, while, functions with typed parameters, load8/store8, and string literals. Nothing else.

test-self-three-plus-five

Test the self-hosted compiler on a simple expression. Feed it "3 + 5" as input and check the WAT output.

We can do this entirely in the Zig test harness:

fn test_self_hosted(compiler_src: []const u8, input: []const u8) void {
    // Step 1: Compile the compiler from our language to WAT
    c_program(compiler_src);
    const compiler_wat: []u8 = out[0..out_len];

    // Step 2: Run the compiler WAT in the VM, with 'input' as its stdin
    // Load input into vm_memory at address 0
    for (input, 0..) |c, i| { vm_memory[i] = c; }
    // Set up read_input to return the length
    const result_wat: []const u8 = vm_run_compiler(compiler_wat, input.len);

    // Step 3: The VM's output (in vm_memory at 100000+) is the compiled WAT of 'input'
    print("Self-hosted output:\n{s}\n", .{result_wat});
}

The output should be:

(module
(memory (export "memory") 1)
(func (export "main") (result i64)
  i64.const 3
  i64.const 5
  i64.add
)
)

If it is: the self-hosted compiler, running as WAT inside our VM, produced correct WAT for 3 + 5. The compiler compiled.

test-self-fibonacci

Test the self-hosted compiler on fib(10). Feed it the Fibonacci source. The output should match what our Zig compiler produces.

bootstrap-test

The bootstrap.

Here's the moment everything has been building toward. Four commands:

# Step 1: Compile the compiler with our Zig implementation
zig build-exe minizig.zig && ./minizig < compiler.lang > compiler.wat

# Step 2: Assemble to WASM
wat2wasm compiler.wat -o compiler.wasm

# Step 3: Run the WASM compiler on its own source
node run.js compiler.wasm < compiler.lang > compiler2.wat

# Step 4: Compare
diff compiler.wat compiler2.wat

run.js is a small Node.js script that loads the WASM module with I/O imports:

const fs = require('fs');
const wasm = fs.readFileSync(process.argv[2]);
const input = fs.readFileSync('/dev/stdin');

WebAssembly.instantiate(wasm, {
    env: {
        read_input: () => {
            for (let i = 0; i < input.length; i++)
                mem[i] = input[i];
            return input.length;
        },
        print_char: (c) => process.stdout.write(Buffer.from([Number(c)])),
    }
}).then(({ instance }) => {
    mem = new Uint8Array(instance.exports.memory.buffer);
    instance.exports.main();
});
the-empty-diff

Run it.

$ diff compiler.wat compiler2.wat
$

Empty. No output. The diff found nothing.

The compiler -- compiled to WebAssembly by the Zig host -- and the same compiler -- compiled to WebAssembly by itself running as WebAssembly -- produced identical output when given the same input.

The snake has eaten its tail.

### What just happened

Let's trace the full path:

1. We wrote eval("3+5") and got 8. (Problem 4)
2. We added a cursor, multi-digit numbers, whitespace, and precedence. (Problems 6–12)
3. We added typed variables, if/else, while, functions. (Problems 15–27)
4. We wrote a compiler that emits WAT instead of computing values. Same parser, different actions. (Problems 30–44)
5. We built a VM that executes WAT, proving the compiler correct through agreement testing. (Problems 48–62)
6. We added memory and strings so the language could manipulate bytes. (Problems 65–75)
7. We wrote compiler primitives in our own language and showed they worked. (Problems 76–79)
8. We assembled those primitives into a complete compiler -- in our language. (Problems 84–92)
9. We compiled the compiler with the Zig host, ran it as WASM, fed it its own source, and got identical output. (Problems 93–97)

Ninety-seven problems. One file. One language. A compiler that compiles itself.

### Why this matters

Self-hosting isn't just a party trick. It's a proof of completeness.

If the compiler can compile itself, it means the language is powerful enough to express its own implementation. Every feature the compiler uses -- arithmetic, string comparison, memory access, recursive descent, output buffering -- is a feature the language provides. There are no hidden dependencies, no magic behind the curtain. The language stands on its own feet.

It's also a practical tool. Once you have a self-hosting compiler, you can improve the language using the language itself. Add a feature to the compiler source, compile it with the old compiler, and now you have a new compiler that supports the new feature. Bootstrap again and the new compiler compiles itself with the new feature. The language grows itself.

c4 did this in 525 lines of C. We did it in a few hundred lines of our language, targeting WebAssembly instead of custom bytecode. The approach is different -- c4 uses single-pass compilation with no tree, just like we do -- but the principle is the same. The snake must be just long enough to eat itself. Not longer. Every unnecessary feature is code you have to compile, which is code you have to write, which is code the compiler has to handle. Minimalism isn't just aesthetic. It's engineering.

# Appendices

## Appendix A: A Simple Allocator

Everything in our language lives at fixed addresses. Arrays start at 300000. The output buffer starts at 100000. The compiler's string table starts at 200000. We chose these addresses by hand and made sure they don't overlap.

That works for a compiler. It doesn't work for a program that needs to create data structures whose size depends on the input. A text editor doesn't know how big the file is until it opens it. A web server doesn't know how many connections it'll handle. These programs need to request memory at runtime -- and eventually give it back.

That's what an allocator does. alloc(size) returns a pointer to size bytes of usable memory. free(ptr) returns those bytes to the pool. Between the two, a program can build linked lists, hash maps, trees, resizable arrays -- any data structure that grows and shrinks.

We're going to build one. No new language features. Everything we need -- load64, store64, pointer arithmetic, linked lists via addresses -- already exists.

### The bump allocator

The simplest possible allocator: a pointer that moves forward.

Appendix A: A Simple Allocator

bump-allocator

Write a bump allocator:

var heap_start: i64 = 500000;
var heap_ptr: i64 = 500000;
var heap_limit: i64 = 900000;

fn alloc(size: i64) i64 {
    var ptr: i64 = heap_ptr;
    heap_ptr += (size + 7) / 8 * 8;
    if (heap_ptr > heap_limit) { return 0; }
    return ptr;
}

That's it. alloc(100) returns 500000 and advances heap_ptr to 500104 (rounded up to a multiple of 8 for alignment). The next alloc(50) returns 500104 and advances to 500160. Each call gets a fresh, non-overlapping region.

The (size + 7) / 8 * 8 rounds up to 8-byte alignment. WAT's i64.load and i64.store require aligned addresses. Without rounding, alloc(3) followed by alloc(8) would place an i64 at a misaligned address.

Test:

    check(
        \\var heap_ptr: i64 = 500000;
        \\var heap_limit: i64 = 900000;
        \\fn alloc(size: i64) i64 {
        \\    var ptr: i64 = heap_ptr;
        \\    heap_ptr += (size + 7) / 8 * 8;
        \\    if (heap_ptr > heap_limit) { return 0; }
        \\    return ptr;
        \\}
        \\var a: i64 = alloc(100);
        \\var b: i64 = alloc(200);
        \\b - a
    , 104);  // 100 rounded up to 104 (next multiple of 8)

The bump allocator is fast (one addition per call), simple (six lines), and useful for programs that only allocate and never free -- compilers, for instance. Our self-hosted compiler could use this instead of pre-assigned addresses.

But it can't free. Memory only grows. For long-running programs, we need something better.

### Block headers

To free memory, we need to know how big each allocation was. We store this information in a header -- 8 bytes immediately before the returned pointer.

  Header (8 bytes)     Payload (size bytes)
  ┌────────────────┬────────────────────────────┐
  │  block_size    │  usable memory             │
  └────────────────┴────────────────────────────┘
  ^                ^
  block address    pointer returned by alloc()

alloc(size) allocates size + 8 bytes, writes the total block size into the header, and returns a pointer to the payload. free(ptr) reads the header at ptr - 8 to learn the block size.

block-headers

Add headers to the bump allocator:

fn alloc(size: i64) i64 {
    var total: i64 = (size + 7) / 8 * 8 + 8;
    var block: i64 = heap_ptr;
    heap_ptr += total;
    if (heap_ptr > heap_limit) { return 0; }
    store64(block, total);
    return block + 8;
}

alloc(100) allocates 112 bytes (100 rounded to 104, plus 8 for the header). It writes 112 at the block address and returns block + 8.

Given a pointer ptr, the block size is load64(ptr - 8).

Test:

    check(
        \\var heap_ptr: i64 = 500000;
        \\var heap_limit: i64 = 900000;
        \\fn alloc(size: i64) i64 {
        \\    var total: i64 = (size + 7) / 8 * 8 + 8;
        \\    var block: i64 = heap_ptr;
        \\    heap_ptr += total;
        \\    if (heap_ptr > heap_limit) { return 0; }
        \\    store64(block, total);
        \\    return block + 8;
        \\}
        \\var p: i64 = alloc(100);
        \\load64(p - 8)
    , 112);  // 100 -> 104 aligned + 8 header = 112

### The free list

When we free a block, we add it to a free list -- a linked list of available blocks. Each free block stores a pointer to the next free block in its payload area (which isn't being used):

  Free block:
  ┌────────────────┬────────────────┬─────────────────┐
  │  block_size    │  next_free     │  (unused)        │
  └────────────────┴────────────────┴─────────────────┘

free_list is a global that points to the first free block (or 0 if the list is empty).

free-list

Write free():

var free_list: i64 = 0;

fn free(ptr: i64) i64 {
    var block: i64 = ptr - 8;
    store64(block + 8, free_list);
    free_list = block;
    return 0;
}

Two lines of real work. We write the current list head into the freed block's payload, then make the freed block the new head. It's a linked-list prepend using raw memory.

alloc-with-reuse

Update alloc() to search the free list before bump-allocating:

fn alloc(size: i64) i64 {
    var total: i64 = (size + 7) / 8 * 8 + 8;

    // Search the free list for a large enough block
    var prev: i64 = 0;
    var curr: i64 = free_list;
    while (curr != 0) {
        var block_size: i64 = load64(curr);
        if (block_size >= total) {
            // Remove from free list
            var next: i64 = load64(curr + 8);
            if (prev == 0) {
                free_list = next;
            } else {
                store64(prev + 8, next);
            }
            return curr + 8;
        }
        prev = curr;
        curr = load64(curr + 8);
    }

    // No free block found -- bump allocate
    var block: i64 = heap_ptr;
    heap_ptr += total;
    if (heap_ptr > heap_limit) { return 0; }
    store64(block, total);
    return block + 8;
}

First-fit strategy: walk the free list, take the first block that's large enough. If nothing fits, fall back to bump allocation.

Test -- alloc, free, alloc again reuses the freed block:

    check(
        \\var heap_ptr: i64 = 500000;
        \\var heap_limit: i64 = 900000;
        \\var free_list: i64 = 0;
        \\fn alloc(size: i64) i64 {
        \\    var total: i64 = (size + 7) / 8 * 8 + 8;
        \\    var prev: i64 = 0;
        \\    var curr: i64 = free_list;
        \\    while (curr != 0) {
        \\        var block_size: i64 = load64(curr);
        \\        if (block_size >= total) {
        \\            var next: i64 = load64(curr + 8);
        \\            if (prev == 0) { free_list = next; }
        \\            else { store64(prev + 8, next); }
        \\            return curr + 8;
        \\        }
        \\        prev = curr;
        \\        curr = load64(curr + 8);
        \\    }
        \\    var block: i64 = heap_ptr;
        \\    heap_ptr += total;
        \\    if (heap_ptr > heap_limit) { return 0; }
        \\    store64(block, total);
        \\    return block + 8;
        \\}
        \\fn free(ptr: i64) i64 {
        \\    var block: i64 = ptr - 8;
        \\    store64(block + 8, free_list);
        \\    free_list = block;
        \\    return 0;
        \\}
        \\var a: i64 = alloc(100);
        \\var b: i64 = alloc(200);
        \\free(a);
        \\var c: i64 = alloc(50);
        \\c == a
    , 1);  // c reuses a's freed block

a was freed and its 112-byte block went onto the free list. When we allocate 50 bytes (total 56 with header), the free list has a 112-byte block -- large enough. We reuse it. c == a confirms they got the same address.

### A dynamic array

Now let's use the allocator to build something useful: a growable array. Fixed-size arrays require knowing the size at declaration time. A dynamic array starts small and grows as needed -- like Zig's ArrayList, Python's list, or C++'s vector.

dynamic-array-design

Design a dynamic array using three values: ptr (pointer to the data), len (number of elements), cap (capacity -- how many elements fit before growing). Pack these into three consecutive i64 values in memory:

fn da_new() i64 {
    var da: i64 = alloc(24);
    store64(da, 0);       // ptr = null
    store64(da + 8, 0);   // len = 0
    store64(da + 16, 0);  // cap = 0
    return da;
}

fn da_len(da: i64) i64 { return load64(da + 8); }

fn da_get(da: i64, i: i64) i64 {
    return load64(load64(da) + i * 8);
}
dynamic-array-push

Write da_push -- append an element, growing if necessary:

fn da_push(da: i64, val: i64) i64 {
    var len: i64 = load64(da + 8);
    var cap: i64 = load64(da + 16);
    if (len >= cap) {
        // Grow: double capacity (minimum 4)
        var new_cap: i64 = cap * 2;
        if (new_cap < 4) { new_cap = 4; }
        var new_ptr: i64 = alloc(new_cap * 8);
        var old_ptr: i64 = load64(da);
        // Copy old elements
        for (0..len) |i| {
            store64(new_ptr + i * 8, load64(old_ptr + i * 8));
        }
        // Free old buffer (if any)
        if (old_ptr != 0) { free(old_ptr); }
        store64(da, new_ptr);
        store64(da + 16, new_cap);
    }
    store64(load64(da) + len * 8, val);
    store64(da + 8, len + 1);
    return 0;
}

This is the classic amortized-doubling strategy. Most pushes are O(1) -- just store and increment. Occasionally the array is full and we allocate a new buffer, copy everything, and free the old one. The amortized cost per push is O(1).

Test:

    // (include alloc, free, da_new, da_push, da_get, da_len from above)
    check(
        \\ ... allocator and dynamic array functions ...
        \\var arr: i64 = da_new();
        \\da_push(arr, 10);
        \\da_push(arr, 20);
        \\da_push(arr, 30);
        \\da_push(arr, 40);
        \\da_push(arr, 50);
        \\da_get(arr, 0) + da_get(arr, 4)
    , 60);  // 10 + 50

Five elements pushed. The array started at capacity 0, grew to 4 on the first push, then grew to 8 on the fifth push. da_get(arr, 0) returns 10. da_get(arr, 4) returns 50.

dynamic-array-pop

Write da_pop -- remove and return the last element:

fn da_pop(da: i64) i64 {
    var len: i64 = load64(da + 8);
    if (len == 0) { return 0; }
    len -= 1;
    store64(da + 8, len);
    return load64(load64(da) + len * 8);
}

Test:

    // push 10, 20, 30; pop gives 30, 20, 10
    check(
        \\ ... allocator and dynamic array functions ...
        \\var arr: i64 = da_new();
        \\da_push(arr, 10);
        \\da_push(arr, 20);
        \\da_push(arr, 30);
        \\var a: i64 = da_pop(arr);
        \\var b: i64 = da_pop(arr);
        \\a * 100 + b
    , 3020);  // 30 * 100 + 20

### Coalescing

Our allocator has a problem: fragmentation. If you allocate and free many small blocks, the free list fills up with tiny pieces that can't satisfy larger requests -- even though the total free memory is plenty.

The fix is coalescing: when freeing a block, check if the block immediately after it in memory is also free. If so, merge them into one larger block.

free-with-coalescing

Write free with forward coalescing:

fn free(ptr: i64) i64 {
    var block: i64 = ptr - 8;
    var size: i64 = load64(block);

    // Check if the next block in memory is on the free list
    var next_block: i64 = block + size;
    var prev: i64 = 0;
    var curr: i64 = free_list;
    while (curr != 0) {
        if (curr == next_block) {
            // Adjacent! Merge: absorb next block's size
            size += load64(curr);
            store64(block, size);
            // Remove next_block from free list
            var after: i64 = load64(curr + 8);
            if (prev == 0) { free_list = after; }
            else { store64(prev + 8, after); }
            break;
        }
        prev = curr;
        curr = load64(curr + 8);
    }

    // Add merged (or unmerged) block to free list
    store64(block + 8, free_list);
    free_list = block;
    return 0;
}

When we free block A, we check if block B (at address A + A's size) is on the free list. If it is, we remove B from the free list, add B's size to A, and put the merged block on the free list. One large block instead of two small ones.

Test -- allocate three blocks, free the middle one, free the first one, they should coalesce:

    check(
        \\ ... allocator with coalescing free ...
        \\var a: i64 = alloc(32);
        \\var b: i64 = alloc(32);
        \\var c: i64 = alloc(32);
        \\free(b);
        \\free(a);
        \\var d: i64 = alloc(72);
        \\d == a
    , 1);

a and b were adjacent in memory. Freeing b puts it on the free list. Freeing a checks if the block after a (which is b) is free -- it is. They merge into one 80-byte block (40 + 40). When we allocate 72 bytes (total 80 with header), the merged block fits perfectly.

design-why-not-builtin

Why did we build the allocator in our language instead of adding alloc as a builtin?

Because we could. The allocator uses load64, store64, pointer arithmetic, and a linked list -- all things the language already supports. Making it a builtin would hide the mechanism. Making it a library function proves the language is powerful enough for systems programming.

This is the same philosophy as streq -- we wrote string comparison instead of importing it. And emit_num -- we wrote number-to-string conversion instead of calling a runtime function. The language builds its own tools. Each tool validates the language.

In production, you'd want the allocator to be fast (a buddy allocator, slab allocator, or similar). What we built here is slow (linear free-list search) but correct, and correct is what matters for understanding. The reader now knows what malloc does -- because they wrote it.

What we built: a complete memory allocator in 50 lines of the language we created from scratch. Bump allocation, free-list allocation, coalescing. A dynamic array built on top of it. No new language features, no builtins, no magic -- just load64, store64, and arithmetic.

The language is no longer just a compiler's implementation language. It's a systems programming language. A simple one, with rough edges, but real. You can build data structures in it. You can manage memory in it. You can write programs that don't know their size at compile time.

Not bad for a language that started with eval("3+5").

## Appendix B: Floating Point

Until now, our language has had one type: i64. Numbers, booleans, packed strings, array addresses, function return values -- all integers. This made the compiler simple: emit i64.add, always. No type checking, no type inference, no decisions.

Now we break that rule. We add f64 -- 64-bit floating point. This is a bigger change than it sounds. With two types, the compiler has to answer a new question for every operation: which add? i64.add or f64.add? The variable tables need to track types. The WAT output emits (local $x f64) instead of (local $x i64). The VM's stack carries values that might be integers or might be floats.

One type was simplicity. Two types is expressiveness. Let's see what it costs.

### Float literals

A float literal is a number with a decimal point. 3.14 is float. 3 is integer. The dot is the signal.

Appendix B: Floating Point

float-literal-parsing

Add float literal parsing to the interpreter's factor(). When number() reads digits and hits a . followed by more digits, it's a float.

We store floats as their IEEE 754 bit pattern inside an i64, using Zig's @bitCast:

fn parse_float(whole: i64) i64 {
    var f: f64 = @floatFromInt(whole);
    pos += 1; // skip the '.'
    var divisor: f64 = 10.0;
    while (cur() >= '0' and cur() <= '9') {
        const digit: f64 = @floatFromInt(cur() - '0');
        f += digit / divisor;
        divisor *= 10.0;
        pos += 1;
    }
    skip();
    return @bitCast(f);
}

In factor(), after calling number(), check if the next character is .:

    // In the number/expression path:
    const val: i64 = number();
    if (cur() == '.' and pos + 1 < source.len and source[pos + 1] >= '0' and source[pos + 1] <= '9') {
        return parse_float(val);
    }

The returned i64 holds the bit pattern of a float. 3.14 returns the 64 bits of IEEE 754 3.14. To the variable storage system, it's just a number. The interpretation depends on the variable's declared type.

float-type-tracking

Track variable types. Add a type flag:

var var_types: [256]u8 = [_]u8{0} ** 256;    // 0 = i64, 1 = f64
var global_types: [256]u8 = [_]u8{0} ** 256;

When exec_stmt handles var x: f64 = 3.14;, set the type flag:

    // After reading the type name:
    const type_name: []const u8 = read_name();
    const is_float: bool = streq(type_name, "f64");
    // ... later, when storing:
    if (is_float) { var_types[num_vars] = 1; }

Write helpers to check types:

fn isFloat(name: []const u8) bool {
    for (0..num_vars) |i| {
        if (streq(var_names[i], name) and var_types[i] == 1) { return true; }
    }
    for (0..num_globals) |i| {
        if (streq(global_names[i], name) and global_types[i] == 1) { return true; }
    }
    return false;
}

### Float arithmetic

The interpreter needs to know which operations are float and which are integer. We track a "current expression type" -- integer by default, float if any operand is float.

float-arithmetic

Modify term() to handle float arithmetic:

var expr_is_float: bool = false;

fn factor() i64 {
    // ... existing code ...
    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (isFloat(name)) { expr_is_float = true; }
        // ...
    }
    // float literal detected -> expr_is_float = true
}

fn term() i64 {
    var val: i64 = factor();
    while (cur() == '*' or cur() == '/' or cur() == '%') {
        const op: u8 = cur();
        pos += 1; skip();
        const right: i64 = factor();
        if (expr_is_float) {
            const a: f64 = @bitCast(val);
            const b: f64 = @bitCast(right);
            if (op == '*') { val = @bitCast(a * b); }
            else if (op == '/') { val = @bitCast(a / b); }
            else { val = @bitCast(@rem(a, b)); }
        } else {
            if (op == '*') { val = val * right; }
            else if (op == '/') { val = @divTrunc(val, right); }
            else { val = @rem(val, right); }
        }
    }
    return val;
}

Same structure as before. The if (expr_is_float) branch bitcasts to f64, operates, bitcasts back. The else branch does integer math as before.

Apply the same pattern to add_sub() and comparison().

Test -- we need a helper to compare float results since bitcast i64 values aren't human-readable:

fn check_float(input: []const u8, expected: f64) void {
    source = input; pos = 0; num_vars = 0; num_globals = 0;
    fn_count = 0; return_flag = false; in_function = false;
    save_depth = 0; expr_is_float = false; skip();
    const got: i64 = program();
    const got_f: f64 = @bitCast(got);
    if (@abs(got_f - expected) < 0.0001) {
        print("  ok {d:.4}\n", .{got_f});
    } else {
        print("  FAIL got={d:.4} want={d:.4}\n", .{ got_f, expected });
    }
}
    check_float("3.14", 3.14);
    check_float("2.0 + 3.0", 5.0);
    check_float("6.0 * 7.0", 42.0);
    check_float("22.0 / 7.0", 3.142857);
    check_float("var r: f64 = 2.5; r * r * 3.14159", 19.6349);

### Type conversion

Integer and float are separate worlds. 3 + 2.5 doesn't automatically work -- the 3 is an i64 and the 2.5 is an f64 bit pattern. We need explicit conversion.

float-conversion

Add conversion builtins:

    if (streq(name, "f64_from_i64")) {
        pos += 1; skip();
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const f: f64 = @floatFromInt(val);
        expr_is_float = true;
        return @bitCast(f);
    }
    if (streq(name, "i64_from_f64")) {
        pos += 1; skip();
        const was_float: bool = expr_is_float;
        expr_is_float = true;
        const val: i64 = expression();
        if (cur() == ')') { pos += 1; skip(); }
        const f: f64 = @bitCast(val);
        expr_is_float = was_float;
        return @intFromFloat(f);
    }

f64_from_i64(42) converts integer 42 to float 42.0 (as a bitcast i64). i64_from_f64(3.7) truncates to integer 3.

Test:

    check_float("f64_from_i64(42)", 42.0);
    check("i64_from_f64(3.7)", 3);
    check("i64_from_f64(2.0 + 0.9)", 2);  // truncates, not rounds
    check_float("f64_from_i64(355) / f64_from_i64(113)", 3.14159);

The last test: 355/113 is a famous rational approximation of pi. With integer division it gives 3. With float division it gives 3.14159...

### Compiling floats

The compiler needs to emit the right WAT instructions based on types.

compiler-float-types

Track types in the compiler. Maintain a parallel array of type flags for globals and locals. When c_factor() encounters a float variable or float literal, set a flag. When c_term() / c_add_sub() emit arithmetic, choose based on the flag:

var c_expr_is_float: bool = false;

fn c_factor() void {
    // Float literal
    if (cur() >= '0' and cur() <= '9') {
        const val: i64 = number();
        if (cur() == '.' and pos + 1 < source.len and source[pos + 1] >= '0') {
            // Parse float, emit f64.const
            c_expr_is_float = true;
            emit_str("  f64.const ");
            emit_num(val);
            emit_byte('.');
            pos += 1;
            while (cur() >= '0' and cur() <= '9') {
                emit_byte(cur());
                pos += 1;
            }
            emit_byte('\n');
            skip();
            return;
        }
        emit_const(val);
        return;
    }
    // Variable reference
    if (is_letter(cur())) {
        const name: []const u8 = read_name();
        if (c_is_float(name)) { c_expr_is_float = true; }
        // ... emit local.get/global.get as before
    }
}

fn c_term() void {
    c_factor();
    while (cur() == '*' or cur() == '/' or cur() == '%') {
        const op: u8 = cur();
        pos += 1; skip();
        c_factor();
        if (c_expr_is_float) {
            if (op == '*') { emit_op("f64.mul"); }
            else if (op == '/') { emit_op("f64.div"); }
        } else {
            if (op == '*') { emit_op("i64.mul"); }
            else if (op == '/') { emit_op("i64.div_s"); }
            else { emit_op("i64.rem_s"); }
        }
    }
}

Same pattern for c_add_sub() and c_comparison().

For local declarations, emit the right type:

    if (c_is_float(local_names[i])) {
        emit_str("  (local $"); emit_str(local_names[i]); emit_str(" f64)\n");
    } else {
        emit_str("  (local $"); emit_str(local_names[i]); emit_str(" i64)\n");
    }

For globals:

    emit_str("(global $"); emit_str(name);
    if (is_float) {
        emit_str(" (mut f64) (f64.const 0))\n");
    } else {
        emit_str(" (mut i64) (i64.const 0))\n");
    }
compiler-float-conversion

Add conversion builtins to the compiler:

    if (streq(name, "f64_from_i64")) {
        // compile argument (i64 expression)
        pos += 1; skip();
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("f64.convert_i64_s");
        c_expr_is_float = true;
        return;
    }
    if (streq(name, "i64_from_f64")) {
        pos += 1; skip();
        const was: bool = c_expr_is_float;
        c_expr_is_float = true;
        c_expression();
        if (cur() == ')') { pos += 1; skip(); }
        emit_op("i64.trunc_f64_s");
        c_expr_is_float = was;
        return;
    }

WAT instructions: f64.convert_i64_s converts a signed i64 to f64. i64.trunc_f64_s truncates an f64 to a signed i64.

vm-float-instructions

Add float instructions to the VM. Each one bitcasts the i64 stack values to f64, operates, and bitcasts back:

    if (streq(line, "f64.add")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a + b));
        return;
    }
    if (streq(line, "f64.sub")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a - b));
        return;
    }
    if (streq(line, "f64.mul")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a * b));
        return;
    }
    if (streq(line, "f64.div")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(@bitCast(a / b));
        return;
    }
    if (starts_with(line, "f64.const ")) {
        // Parse the float from the text
        const f: f64 = parse_f64_from_line(line, 10);
        vm_push(@bitCast(f));
        return;
    }
    if (streq(line, "f64.convert_i64_s")) {
        const i: i64 = vm_pop();
        const f: f64 = @floatFromInt(i);
        vm_push(@bitCast(f));
        return;
    }
    if (streq(line, "i64.trunc_f64_s")) {
        const f: f64 = @bitCast(vm_pop());
        const i: i64 = @intFromFloat(f);
        vm_push(i);
        return;
    }

Also add float comparisons: f64.gt, f64.lt, f64.ge, f64.le, f64.eq, f64.ne. These return i64 (1 or 0), not f64 -- a comparison's result is always boolean.

    if (streq(line, "f64.gt")) {
        const b: f64 = @bitCast(vm_pop());
        const a: f64 = @bitCast(vm_pop());
        vm_push(if (a > b) @as(i64, 1) else 0);
        return;
    }
    // ... same pattern for lt, ge, le, eq, ne
float-agreement

Float agreement tests:

    // Direct float arithmetic
    check_both_float("2.0 + 3.0", 5.0);
    check_both_float("6.0 * 7.0", 42.0);
    check_both_float("22.0 / 7.0", 3.142857);

    // Float variables
    check_both_float("var x: f64 = 2.5; x * x", 6.25);

    // Conversion
    check_both_float("f64_from_i64(42)", 42.0);
    check_both("i64_from_f64(3.7)", 3);

    // Float in functions
    check_both_float(
        \\fn circle_area(r: f64) f64 {
        \\    return r * r * 3.14159;
        \\}
        \\circle_area(5.0)
    , 78.5397);

The function circle_area takes and returns f64. The compiler emits (func $circle_area (param $r f64) (result f64) -- real WAT float types. The VM executes float multiplication. Both engines agree.

### Practical example

newtons-sqrt

Newton's method for square root -- an algorithm that needs float arithmetic:

    check_both_float(
        \\fn sqrt(n: f64) f64 {
        \\    var guess: f64 = n / 2.0;
        \\    for (0..20) |i| {
        \\        guess = (guess + n / guess) / 2.0;
        \\    }
        \\    return guess;
        \\}
        \\sqrt(2.0)
    , 1.41421);

Twenty iterations of Newton's method, starting from n/2. Each iteration: guess = (guess + n/guess) / 2. Converges to sqrt(2) = 1.41421356... in about 5 iterations, but 20 is safe.

This is real numerical computation. The same algorithm, the same source code, running through both the interpreter (using @bitCast to pack/unpack floats) and the compiler (emitting f64.div, f64.add, f64.mul). Agreement proves both paths handle floating point correctly.

design-why-f64

Why f64 and not f32?

Three reasons.

First, WAT's f64 maps to Zig's f64, JavaScript's number, and C's double. It's the natural size. f32 would require explicit truncation in most host environments.

Second, f64 has 53 bits of significand -- enough to exactly represent every 32-bit integer. This means f64_from_i64(n) is lossless for values up to 2^53. With f32 (24 bits of significand), even f32_from_i64(16777217) loses precision. Since our integer type is 64-bit, f64 is the safe companion.

Third, f64.const 3.14 in WAT is 8 bytes, same as i64.const. No size savings from f32. The VM stack is already 8 bytes per slot. Everything aligns naturally at 64 bits.

trace-float-stack

Walk through the WAT for 2.0 + 3.0 * 4.0:

| Instruction | Stack (as f64) |
|-------------|----------------|
| f64.const 2.0 | [2.0] |
| f64.const 3.0 | [2.0, 3.0] |
| f64.const 4.0 | [2.0, 3.0, 4.0] |
| f64.mul | [2.0, 12.0] |
| f64.add | [14.0] |

Identical structure to the i64 trace from Problem 44. Precedence works the same way -- c_term grabs the 3.0 * 4.0 first. The only difference is the instruction prefix: f64 instead of i64. Same parser, same recursion, same stack machine. Two types, one architecture.

What we added: f64 as a second type. Float literals with decimal points. Float arithmetic, comparisons, and conversion. Type tracking in the compiler to emit the right WAT instructions. The VM handling floats via bitcast.

What we learned: adding a second type touches everything. The parser (literal detection), the interpreter (dispatch on type), the compiler (instruction selection, local/global declarations), the VM (new instruction handlers), and the test harness (float comparison). One type was simple. Two types is real.

But the architecture didn't change. The parser is still recursive descent. The compiler is still syntax-directed. The VM is still a line-by-line stack machine. Types added a dimension to every decision, but they didn't change the decisions. f64.add is just i64.add wearing different shoes -- the same metaphor we started with.

## Summary

| Part | What it adds | Problems |
|------|-------------|----------|
| 1 | Parse and eval "3+5" | 1–5 |
| 2 | Cursor, chains, multi-digit, whitespace, precedence | 6–14 |
| 3 | Typed variables, streq, assignment | 15–19 |
| 4 | If/else (mandatory braces), while | 20–23 |
| 5 | Typed functions, return, fib(10) | 24–29 |
| 6 | Compiler: parse -> WAT (syntax-directed, stack-clean) | 30–47 |
| 7 | VM: execute WAT, agreement testing | 48–64 |
| 8 | Memory (load8/store8/load64/store64), strings | 65–79 |
| 9 | Global variables | 80–93 |
| 10 | Completing the language (Zig subset) | 94–111 |
| 11 | The compiler in its own language | 112–126 |
| 12 | The bootstrap, the diff, the proof | 127–131 |
| A | A simple allocator | A1–A9 |
| B | Floating point (f64) | B1–B11 |

You started by making "3+5" give 8.

You ended with a compiler that compiles itself.

The proof is an empty diff -- two engines, same output, on every input, including the compiler's own source code.

Not bad for one file.