When code is data
You may find yourself itching to write a new programming language, or at least learn why someone might want to write a new programming language. Perhaps it looks like JavaScript. Well, JavaScript is a complicated language. If I were to give you the following code:
for (let i = 1; i <= 100; i++) {
if (i % 15 === 0) {
console.log("FizzBuzz")
} else if (i % 3 === 0) {
console.log("Fizz")
} else if (i % 5 === 0) {
console.log("Buzz")
} else {
console.log(i)
}
}
How would you⌠run this? Youâd probably save it as
google-interview.js
and run node google-interview.js.
Fair. But what if you didnât have node
? What might a program look like that could interpret this code?
interpret("for (let i = 1; i <= 100 ...")
// 1, 2, Fizz, 4, Buzz, ...
Enter the wild world of interpreters. Perhaps we want to just eval this code and get an answer, or maybe we want to compile it into *waves hands* something that weâll run on another machine, or on the same machine but much more quickly. Letâs not get ahead of ourselves, and just talk about interpreting our code.
First we need to convince ourselves (and our
interpret
function, usually called eval
) what a for loop is. Itâs defined as having the following:
-
an initializer (a variable
i
set to1
) -
a test condition (
i <= 100
) -
an update to run on each iteration (
i++
) -
and finally, a âbodyâ of code to run on each iteration (the if statement and all the stuff inside of that)
You and I could sit here for an hour and work through some code that could interpret these! It might look like the following:
function runForLoop(obj) {
// Helper functions left as an incredibly boring and
// tedious exercise for the reader
let currVal = buildInit(obj.init);
let test = buildTest(obj.test);
while (test(currVal)) {
runBody(obj.body);
currVal = runUpdate(currVal, obj.update);
}
}
runForLoop({
init: {
type: "Assignment",
name: "i",
value: {
type: "Number",
value: 1,
},
},
test: {
type: "BinaryExpression",
op: "<=",
left: {
type: "Variable",
name: "i"
},
right: {
type: "Number",
value: 10,
},
},
...
})
// 1, 2, Fizz, 4, Buzz, ...
Itâs got everything we need! The init, the test, and⌠the other stuff. The details of which weâll gradually add color too, but for now recognize that weâre capable of turning some object that looks for-loop-y into some output. Youâre welcome to run through such an exercise if you so wish -
A frontend programmer's guide to languages.
But something looks different here. How did we get from
for (let i = 1; i <= 100 ...
to that well-organized (and written on the fly so please excuse any abnormalities and typos) JSON object? Welcome to the world of parsing, an art that humans have gotten pretty good at over the last 2,400 years.
Parsing is the act of taking an input string (our source code
for (let i = 1; ...) { ... }
) and turning it into a data structure ({ init: { ... }, test: { ... }, update: { ... }, ... }
- this is commonly referred to as an Abstract Syntax Tree) that we can easily iterate over to compute a result.
For JavaScript, we would use something like Acorn or @babel/parser. But what if we could skip it all? What if we could iterate directly over our source code, instead of needing to translate it into a tree? Allow me to put on my wizard hat and robe.
The Lisp family of programming languages uses a syntax that is full of parentheses and looks quite foreign to those familiar with C-style syntax (think: curly braces, semicolons on each line,
fn(1, 2)
function calls). Hereâs some code written in Racket.
(+ 5 10)
; => 15
(define (factorial n)
(cond ((= n 0) 1)
(else
(\* n (factorial (- n 1))))))
(factorial 10)
; => 3628800
But itâs not all that complicated. Just about everything is an S-expression (a list of lists) and the syntax can be roughly (very roughly) summarized to the following:
-
Literals like
1
and"hello"
look like youâd expect from any other language -
Function calls are written as
(fn a b c ...)
instead offn(a, b, c, ...)
-
Several âspecial formsâ exist like
(if test a b)
,(cond ((condition-1) ...) (else ...))
, and(define (fn arg1 arg2) ...)
that behave different from ordinary function calls -
Lists are written using
(list 1 2 3)
or via the act of âquotingâ where we put a single apostrophe in front:'(1 2 3)
Oh and they can be nested'(1 2 (3 4) 5)
Hereâs an arithmetic expression, followed by a quoted arithmetic expression:
(+ 5 (\* 3 4))
; => 17
'(+ 5 (\* 3 4))
; => '(+ 5 (\* 3 4))
Hereâs where it gets interesting.
(first '(+ 5 (\* 3 4)))
; => '+
(second '(+ 5 (\* 3 4)))
; => 5
(third '(+ 5 (\* 3 4)))
; => '(\* 3 4)
(first (third '(+ 5 (\* 3 4))))
; => '\*
Remember our for statement from above?
for (let i = 1; i < 100; ...) { ... }
. When we wanted to interpret it, we first needed to parse it into a fancy JSON object that listed out the various parts before we could get a result. Letâs try it for our little math expression here using AST Explorer.
5 + (3 \* 4)
{
"type": "ExpressionStatement",
"expression": {
"type": "BinaryExpression",
"left": {
"type": "Literal",
"value": 5,
},
"operator": "+",
"right": {
"type": "BinaryExpression",
"left": {
"type": "Literal",
"value": 3,
},
"operator": "\*",
"right": {
"type": "Literal",
"value": 4,
}
}
}
}
To get the values we need to start computing, we can poke around this as follows:
ast.expression.operator
// => '+'
ast.expression.left
// => { type: 'Literal', value: 5 }
ast.expression.right
// => {
// type: 'BinaryExpression',
// left: { type: 'Literal', value: 3 },
// operator: '\*',
// right: { type: 'Literal', value: 4 }
// }
ast.expression.right.operator
// => '\*'
For Lisp? Our source code is an S-expression.
'(+ 5 (\* 3 4))
And we can happily dance around it to get the fields we need to start computing.
(first (third '(+ 5 (\* 3 4))))
; => '\*
(second (third '(+ 5 (\* 3 4))))
; => 3
There is no parsing step. There is no AST Explorer because our AST is right in front of us.
This property of the Lisp family of languages is called âcode as dataâ or - if you want to impress your coworkers - Homoiconicity. It has many interesting consequences, from expressive macro systems and syntax extensions to self-modifying code, but for this purposes of this short essay weâll stick to interpreters.
Letâs say our arithmetic expression
(+ 5 (\* 3 4))
- which, is valid Lisp syntax - is part of a small language with the following:
-
Numbers are written literally, that is
'5
is just 5. -
We can add exactly two expressions with
(+ a b)
-
We can multiply exactly two expressions with
(\* a b)
Without much code-golfing we can achieve this in 8 lines of Racket code.
(define (eval math-exp)
(cond ((number? math-exp) math-exp)
((eq? '+ (first math-exp))
(+ (eval (second math-exp))
(eval (third math-exp))))
((eq? '\* (first math-exp))
(\* (eval (second math-exp))
(eval (third math-exp))))))
And for some tests:
(eval '42)
; => 42
(eval '(+ 4 5))
; => 9
(eval '(+ 5 (\* 3 4)))
; => 17
(+ 5 (\* 3 4))
; => 17
No parsing, no translating, just some quoted source code and a couple of
first
s and eq?
s. Why so simple? Because the code is just data.
One side effect of this is that extending our language is pretty simple, just some additions to
eval
. Letâs try this out by adding a fancy if>=
expression. We want it to be of the special form (if>= math-exp positive negative)
which evaluates math-exp
and runs positive
if the result is greater than or equal to 0, and negative
otherwise.
(define (eval math-exp)
(cond ((number? math-exp) math-exp)
((eq? '+ (first math-exp))
(+ (eval (second math-exp))
(eval (third math-exp))))
((eq? '\* (first math-exp))
(\* (eval (second math-exp))
(eval (third math-exp))))
; â¨
((eq? 'if>= (first math-exp))
(if (>= (eval (second math-exp)) 0)
(eval (third math-exp))
(eval (fourth math-exp))))))
And we see that with four extra lines to eval and nothing more, we get a cute
if>=
form that interacts nicely with everything else in the language.
(eval '(if>= (+ 3 4) 100 -9999))
; => 100
(eval '(if>= (+ -300 4) 100 -9999))
; => -9999
(eval '(+ 5 (if>= 0 100 -100)))
; => 105
To reiterate:
(if>= 0 100 -100)
may appear to be different from (\* 100 200)
or (+ 3 4)
but itâs still just an S-expression and weâve gotten pretty good at jumping around S-expressions to get what we need. We can continue tacking on new forms and operators onto eval
(starting with /
and continuing all the way to cond
). What might an eval
look like that can⌠eval
its own source code?
I hope these examples change the way you look at programming languages and the syntax they are built upon. There are endless paths to continue learning about âcode as dataâ and how it appears in different programming languages to different advantages.
Until then, enjoy learning more about programming languages, and be sure know to let me know what you find.