/Projects/

Dust lang

A Rust bytecode compiler for a minimal functional language

Skills: #rust#compilers#parsing#formatter

Dust is a functional, dynamically typed language with a minimal core AST. You can read more about its features in the repo or in the language documentation. Here, instead, I'm going to focus on the implementation details and the interesting problems behind it.

Dust is implemented using the same interpreter architecture described here, but using Rust instead.

TL;DR: a bytecode interpreter compiles the code to a pseudo-assembly representation, and then executes it via a virtual machine.

There are few reasons why Rust is a great fit for a similar task:

  • Expressive and pragmatic type system makes complex tasks comprehensible. Working with data structures such as ASTs wouldn't be feasible otherwise
  • The error handling is type-safe (since errors are just data) but Rust's constructs (like ?) make it very pragmatic to handle all the edge cases
  • No garbage collection means that you can have a low-level control if needed - and implement garbage collection yourself
  • Great macro-powered libraries to create a CLI
  • Rust allows to target the web browser using webassembly as compilation target
  • And of course, great speed. Not crucial for the compiler but crucial for the runtime

Garbage collection

Since Rust itself is not garbage-collected, a custom implementation was needed. A classic choice is the mark and sweep algorithm. However that's what's called a "stop the world" garbage collection algorithm: whenever the garbage collection phase begins, every other calculation is blocked. That can result in unpredictable runtime performances. Since Rust offers it as a safe abstraction, a natural choice is the reference counting algorithm - that prevents the problems above. A downside of this is that it can lead to memory leaks if there are cycles. Due to Dust's semantics, however, this cannot happen.

Formatting

Believe it or not, pretty-printing a program is a particularly difficult task. Bob Nystrom - one of the dart lang contributors - mentions it as the hardest program he's ever written. I implemented a non-recursive version of the algorithm proposed in Christian Lindig's "Strictly Pretty" paper, that defines an algebra to formalize the pretty-printing process. The paper is, in turn, a strict implementation of Philip Wadler "A prettier printer's" paper and its the approach used by Elixir's Inspect.Algebra module or from the well-known Javascript's de-facto formatter Prettier.

Parsing

Parsing is implemented via the Pratt parsing algorithm. You can read a very approachable explanation in this blog post (from one of the main Rust analyzer and the Rust intellij plugin contributors). Other beginner-friendly resources are Bob Nystrom's crafting interpreters and Thorsten Ball's Writing an interpreter in Go.

Tail call optimization

In a functional language whose the only looping mechanism is recursion, is crucial to be able to tail-recur without oveflowing the call stack. Due to the simplicity of the AST (stripped of the syntax sugar), it's easy to perform static analysis to check when a call is in tail position.

Here's the complete set of rules used by the static analysis algorithm
  • fn arg_1, ..., arg_n { body }

    • body is in tail position
  • f(x_1, ..., x_n)

    • Neither f nor x_i (for every i) are in tail position
    • if the node is in tail position, this is considered a tail call
  • x && y

    • x is not in tail position
    • y is in tail position when the node is in tail position
  • { a; b }

    • a is not in tail position
    • b is in tail position when the node is in tail position
    • subsequent ; expressions are just sugar nested ; expressions
  • { let name = value; body }

    • value is not in tail position
    • body is in tail position when the node is in tail position
    • subsequent let expressions are just sugar nested let expressions
  • if b { x } else { y }

    • b is not in tail position
    • x and y are in tail position when the node is in tail position

Conclusion

Implementing a programming language is a beatiful problem, filled of interesting sub-problems to solve. The language is still at a very early POC stage, and I hope to update this page as I implement new features.