A frontend programmer's guide to languages
Today, weâre going to make a programming language.
No, it wonât take a PhD, or years of your time (or even days for that matter). You and I, together, are going to learn what a programming language is and actually build one that can evaluate programs.
By the end of this post weâll have written a JavaScript function,
evaluate
, which interprets a small programming language with strings and variables
. You wonât be going off and rewriting your stack in it (though youâre certainly welcome to try!), but I hope itâs a fun exercise.
I encourage you to copy the code examples in your editor of choice and actually run them. You should be able to make it through this post in a single listen of Lights - Little Machines.
đ Hello, world!
To kick things off, weâll write an interpreter for a new HelloWorld language, and use it write a Hello, world! program.
Hereâs our goal:
const program = { type: "HelloWorld" }
console.log(evaluate(program))
// => Hello, world!
Iâm sure you have questions. Real quick, letâs introduce some terms that weâll more clearly define as we go.
-
Our âprogramâ is represented by the JavaScript object
{ type: "HelloWorld" }
. Throughout this post, weâll be adding more and more things to this object. For now, thereâs not much to it. -
Our âevaluatorâ (or âinterpreterâ - itâs your call) is a single JavaScript function that accepts a âprogram.â
-
We receive a âvalueâ from our âevaluator,â which we send to the wonderful
console.log
function youâre all familiar with.
Now letâs build our evaluator.
function evaluate(node) {
switch (node.type) {
case "HelloWorld":
return "Hello, world!"
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
This function is not terribly interesting, but Iâd like to call out a few details.
-
Our function accepts a parameter that we call
node
as in, a node of a tree. (dramatic foreshadowing) -
The crux of this function is a single
switch
statement that operates on thetype
field of our node. Our language is very simple, so we only have a single ânode typeâ (namely, âHelloWorldâ). -
For the âHelloWorldâ expression - we return the string âHello, world!â
-
If we see something we donât recognize - we throw an error. The programmer messed up!
At this point we have 8 lines of code that evaluate a simple HelloWorld language. Iâll emphasize the simple here, there are no variables, no loops, no modules, not even numbers - but itâs a language. Our language has a very small grammar, but itâs a language.
Letâs make things more interesting.
đ Strings
In this section weâd like to make a new
HelloStrings
language (marketing wants us to be able to print more than just âHello, world!â) that can produce strings and run two operations on them.
Hereâs our goal for this section:
console.log(
evaluate({
type: "String",
content: "Apple",
})
)
// => Apple
console.log(
evaluate({
type: "Excite",
expression: {
type: "String",
content: "Banana",
},
})
)
// => Banana!
console.log(
evaluate({
type: "Append",
first: {
type: "String",
content: "Apple",
},
second: {
type: "Excite",
expression: {
type: "String",
content: "Banana",
},
},
})
)
// => AppleBanana!
Some initial observations:
-
We ditch our âHelloWorldâ expression in favor of âString,â which has a âcontentâ field in addition to the âtypeâ field we used in the previous section.
-
We introduce âExciteâ which adds exclamation points to expressions.
-
We introduce âAppend,â which also contains expressions in its definition - two of âem in fact!
Letâs start off by creating an evaluator to operate on âStringâ expressions.
function evaluate(node) {
switch (node.type) {
case "String":
return node.content
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
Great, so we replace âHelloWorldâ with âStringâ as the nodeâs type - and return its âcontentâ value instead of the string âHello, world!â
You just wrote a programming language with strings, letâs celebrate!
But if we continue with our desired goal above, we see the following:
Apple
evaluate -- unknown node type Excite
We can evaluate
{ type: "String", content: "Apple" }
no problem, but we donât know what to do with this âExciteâ thing just yet.
So how might we evaluate âExciteâ expressions? Based on how it produces âBanana!â in our desired output, we may be inclined to say that âExciteâ takes a string and adds an exclamation point to it. Simple enough, right? But letâs take a closer look.
{
type: "Excite",
expression: {
type: "String",
content: "Banana"
}
}
The âexpressionâ field of our Excite expression node isnât a string per se, but an expression in the HelloStrings language! Our expression contains an expression inside of it! (Are you beginning to see why our evaluator accepts a
node
as in ânodes of a treeâ?)
To illustrate this further, allow me to propose the following, valid program for the HelloStrings language.
console.log(
evaluate({
type: "Excite",
expression: {
type: "Excite",
expression: {
type: "String",
content: "Banana",
},
},
})
)
In the above example, weâll evaluate the string âBananaâ - excite it to get âBanana!â - than excite that to get âBanana!!â (how very exciting!)
Letâs begin adding Excite expressions to our language.
function evaluate(node) {
switch (node.type) {
case "String":
return node.content;
case "Excite":
// We have this node.expression thing, but what do
// we do with it?
???
default:
throw `evaluate -- unknown node type ${node.type}`;
}
}
Rather than just returning a âvalueâ like we did for String expressions (
return node.content
), weâll need to use node.expression
somehow. As the name suggests, node.expression
is an expression, and what do we do with expressions?
We evaluate them!
case "Excite":
return evaluate(node.expression) + "!";
evaluate
is now a recursive function which operates on String and Excite expressions. Hereâs the function in all its glory and an invitation to implement Append
yourself.
function evaluate(node) {
switch (node.type) {
case "String":
return node.content
case "Excite":
return evaluate(node.expression) + "!"
case "Append":
// Now it's your turn, how might we evaluate
// Append expressions?
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
đ Variables
At this point we have a small language which can produce strings with âExciteâ and âAppendâ statements. Nice job if youâve made it this far!
Letâs use this momentum and expand our language to support a very popular language construct - variables.
By the end of this section, weâll have a language which can declare and retrieve the value of variables in HelloStrings expressions.
console.log(
evaluate(
{
type: "Let",
name: "x",
value: {
type: "String",
content: "Hello, world",
},
expression: {
type: "Excite",
expression: {
type: "Variable",
name: "x",
},
},
},
{}
)
)
// => Hello, world!
Thereâs a lot there, so letâs break it down.
-
evaluate
now takes a second argument (dramatic foreshadowing) -
We introduce a new âVariableâ expression, which will lookup a variable by its name.
-
We introduce a âLetâ expression, which makes a new variable with a ânameâ and a âvalueâ (the value being an expression in our language!), and uses that when evaluating an inner expression.
In our case, weâre setting âxâ to the evaluation of a âHello, worldâ String, then retrieving the value of âxâ to Excite it.
Letâs start by adding âVariableâ expressions to
evaluate
.
function evaluate(node) {
switch (node.type) {
case "Variable":
// We have `node.name`, but what to do with it?
...
}
}
Consider the following code:
evaluate({ type: "Variable", name: "x" })
Here we want to evaluate the variable
x
, but there doesnât seem to be much to work with. In a typical programming language, what we might we do with a variable?
We look it up!
function evaluate(node) {
switch (node.type) {
case "Variable":
return lookup(node.name);
...
}
}
In order for
lookup
to do anything meaningful, we need to provide it with some sort of environment - a collection of variable names and values. Weâll call this env
.
function evaluate(node) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
...
}
}
And where does
env
come from? Rather unfortunately we canât make it out of thin air (believe me, I tried for many hours), but we can pass it in to our evaluator.
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name)
case "String":
return node.content
case "Excite":
return evaluate(node.expression, env) + "!"
case "Append":
// Left as an exercise to the reader
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
Remember to add
env
to our recursive calls in Excite and Append (you did implement Append, right?)
Letâs define
lookup
. For simplicityâs sake, weâll say an environment is a JavaScript object with name
keys and value
values.
function lookup(env, name) {
return env[name]
}
Simple, right?
console.log(evaluate({ type: "Variable", name: "x" }, { x: "Hello, world!" }))
// => Hello, world!
Congratulations you just added variables to your language!
Last but not least, weâll want an easy way to create new variables. Without this, weâll be evaluating Exciteâs and Appendâs with nothing more than a bunch of global variables - and the marketing department definitely doesnât want that.
Letâs start by listing what a âLetâ expression actually contains:
{
type: "Let",
name: "x",
value: {
type: "String",
content: "Hello, world"
},
expression: {
type: "Excite",
expression: {
type: "Variable",
name: "x"
}
}
}
-
A âLetâ type
-
A ânameâ field for naming our new variable
-
A âvalueâ field for giving our new variable a value
-
An âexpressionâ field to use our new variable in!
Excellent, now letâs get started on implementing the thing.
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
case "Let":
let newEnv = ???
return evaluate(node.expression, newEnv);
...
}
}
Weâll be adding a new variable to our environment, and using that to evaluate our expression. For simplicityâs sake, letâs give ourselves an
extendEnv
function to do this.
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
case "Let":
let name = ???
let value = ???
let newEnv = extendEnv(env, name, value)
return evaluate(node.expression, newEnv);
...
}
}
name
is simple, thatâs just node.name
. For value
, however, weâll be given a HelloStrings expression to evaluate.
{
type: "Let",
name: "x",
value: {
type: "String",
content: "Hello, world"
},
...
}
Still, itâs not much code :)
function evaluate(node, env) {
switch (node.type) {
case "Variable":
return lookup(env, node.name);
case "Let":
let name = node.name;
let value = evaluate(node.value, env);
let newEnv = extendEnv(env, name, value);
return evaluate(node.expression, newEnv);
...
}
}
Finally,
extendEnv
.
Our
env
is a simple JavaScript object that maps names
to values
. Weâll need to extend an env
with a new name and value pair. We can do so with Object.assign
:
function extendEnv(env, name, value) {
let envAddition = {}
envAddition[name] = value
return Object.assign({}, env, envAddition)
}
Or, using some of the latest and greatest JavaScript features:
function extendEnv(env, name, value) {
return {
...env,
[name]: value,
}
}
Putting it all together, the combination of
lookup
, extendEnv
, and evaluate
completes the language.
function lookup(env, name) {
return env[name]
}
function extendEnv(env, name, value) {
return {
...env,
[name]: value,
}
}
function evaluate(node, env) {
switch (node.type) {
case "String":
return node.content
case "Excite":
return evaluate(node.expression, env) + "!"
case "Append":
// Left as an exercise to the reader
case "Variable":
return lookup(env, node.name)
case "Let":
let inner = node.expression
let value = evaluate(node.value, env)
let newEnv = extendEnv(env, node.name, value)
return evaluate(node.expression, newEnv)
default:
throw `evaluate -- unknown node type ${node.type}`
}
}
đ Closing notes and things to ponder
If you made it to the end of this post, congratulations and thank you :)
We developed a language and an evaluator that processes Abstract Syntax Trees (or ASTs for short, but I just called âem nodes for simplicity). Our language has constants, some small expressions to operate on strings, and variables.
But this is only the beginning! You are ending this article with an
evaluate
function that can grow to support all sorts of common language features (numbers, comments, functions), the only limit is your imagination.
Iâd like to leave you with a few concrete exercises, should you wish to revisit your new language on a rainy day:
-
How might we add Numbers to our language? How about operations to Add and Multiply them? What about rational numbers (fractions with a numerator and a denominator)?
-
As we saw earlier, we can kickstart our
env
by populating it with global values. Could we add functions to this environment? How might we call them from our language? -
Can we take our abstract syntax tree and use it to generate JavaScript code? How about Ruby code?
-
Writing our programs in as JavaScript objects doesnât seem to scale well. How might we go about creating a syntax for our language and converting that to an AST?
Iâll also go ahead and plug some of my favorite resources on Programming Languages.
-
Dan Grossmanâs Programming Languages on Coursera is an entirely free, entirely amazing three-part course for learning the ins and outs of different types of programming languages. I canât recommend it enough.
-
This article is inspired in no short part by @jamiebuilds's super tiny compiler. I encourage you to check out the code to see some of the concepts introduced in this article in more detail.
-
Crafting Interpreters is an effort to build âa handbook for making programming languages.â An ambitious goal, but itâs completely free and chock full of different interpreter concepts.
Thatâs all for now. If you liked this article feel free to share it and follow me on Twitter while youâre at it: @jdan.