Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Simple steps to implementing a programming language (kjetilvalle.com)
112 points by kvalle on May 5, 2014 | hide | past | favorite | 37 comments


Inventing your own programming language is something I can highly recommend. If you've never done it before it will dramatically broaden your programming horizon. It's a journey.

About a year ago I decided to make a language mainly because I wanted to experiment with some programming paradigms. I had made little domain-specific "languages" many times before, but this was going to be different.

I'd suggest that you try to figure out most things on your own, however. It's way more rewarding to think about the problems you're trying to solve, go ahead and implement them. That way you're also forced to have eye-opening meta thoughts about programming in general. Only after you've gained some appreciation for the issues involved, look them up and see how other people solved them. This worked very well for me.

Lisps and Schemes are certainly easier to implement than most syntaxes, but consider doing something else so you can get a better understanding of the complexities of parsing a non-Lisp.

In my case, I did my prototype in Java, that's the lexer, parser, runtime and standard library. While I wouldn't necessarily want to do it that way again, it was a great learning experience and I was pleased with the result.

About two months ago I decided to give it another try, but this time in C, using the Lua codebase. So far this has been a very good choice for a second stab at the idea. And I was pleased that I could recognize a lot of the patterns in the Lua codebase where I had implemented equivalent parts in my earlier project. If I hadn't taken the hard route back then, I probably would have a harder time now.


While I agree with you that it's worth learning to parse something "non-Lisp", I think that starting with an s-expression type syntax is worthwhile for a first-time even if you want to eventually implement a non-Lisp, for the simple reason that parsing is "easy" and very well documented compared to code-generation.

Unless your language has a particularly hairy grammar (Ruby, I'm looking at you...), figuring out how to parse it into your desired structure afterwards is fairly straight-forward in comparison.

That's why I took the approach of bypassing the parser entirely with my compiler series ( http://www.hokstad.com/compiler ), where the first several parts involved going straight to code-generation from code simply abusing the Ruby Array notation. Then later adding a simple s-expression parser.

I didn't even decide to turn it into a Ruby compiler for quite some time (in retrospect, had I planned to do Ruby from the very beginning, there are a few things I'd likely do differently, but not that much; the biggest issues with writing the series have been learning a lot of new things about how writing about code influences the entire process)


The approach I'm taking for a current language project is to start with a minimalist syntax for the first iteration. This allows me to focus on the actual language implementation, which is the part I'm most interested in learning. It also allows me to rapidly iterate on language features without having to spend too much time fiddling with the parser.

Once I've got a semantics I like, I'll probably do a second pass to come up with a new syntax if I haven't decided I'm done with the project. By then I'll have written plenty of test cases using the first syntax, so I should have a much better idea of what I'd like to get out of a more complex one.


I kind of favor the idea that you should use Lisp syntax for a while. Learning parsers is a nice technique, but it's really a small detail in language design. Eliding it for a while can help you focus on the more interesting issues of semantics.


How far along as a programmer should one be before attempting building a language. I am a novice (very, very much so) Rubyist, and don't think I am anywhere near being able to attempt it, but I do know that, when I was still in school, the process most likely to make a complicated subject click with me was to understand it all the way to it's origin (the starkest example was actually the quadratic formula - I had tons of trouble understanding it and how to use it until I had a teacher who made us actually derive it, and all of a sudden I was one of the best in the class). Building a programming language seems like it might have the same kind of effect on me, and, being a firm believer of living outside your comfort zone, would like to know if it's possible for someone who is nowhere near being able to call himself a programmer to eventually build one.


This may not be a common opinion, but building a language is certainly within the reach of a novice programmer. The base ideas are very, very simple. You do have to be careful not to be overwhelmed by either the terminology ("recursive descent parsing" is a slightly more elaborate version of "reading a file") or some of the more elaborate things that you don't need for the basics but might want for a truly useful tool ("parsing" is a deep subject built mostly around making "reading a file" simpler, for arbitrarily complicated files).

Building languages is also one of the most thoroughly studied topics, which is both a boon and a bane. On one hand, there are lots of good tutorials and excellent tools. On the other hand, there is lots of stuff elaborating on the basics and much of it seems almost deliberately complicated.

Go for it! There are a lot of much worse ways of going into the rabbit hole.


Thanks! Any suggestions for resources to help me get started? I'm not afraid of complicated terminology as long as there is a way for me to easily figure out what it means (whether Wikipedia, a glossary, or simply a great example).


Start with untyped lambda calculus, augmented with numbers, and addition. That's what I did. Such a language is dead simple: you only have four kinds of expressions: numbers, additions, functions, and function calls.

Write a parser and a printer (a pretty printer would be best, but you don't have to). Use one to test the other. Then write an interpreter, who takes an expression as input, evaluates it, and spits out a number.

The whole thing should fit in a couple hundred lines, tops. The only real difficulty is to evaluate function calls. The rest is only a matter of catching runtime errors (like trying to add a number and a function). To properly implement your language, you may have to learn about closures and lexical scoping, though there are tricks to sidestep the issue.

To test your language, check out the Y and Z combinators here https://en.wikipedia.org/wiki/Fixed_point_combinator and use the Z combinator to implement the factorial function.

The next step is to flesh out your language a bit. I suggest you augment it with "let" bindings first: it's just syntax sugar over lambdas: you change your parser, but you don't have to touch the interpreter.


In addition to the article here and "Write Yourself a Scheme in 48 Hours" (the implementation language is Haskell, which may be a barrier)[1], and some others mentioned in the comments here, there's Peter Norvig's "(How to Write a (Lisp) Interpreter (in Python))"[2] and "(An ((Even Better) Lisp) Interpreter (in Python))"[3].

Every last Lisp book out has at least one implementation of a Lisp evaluator, although most don't go into all of the other stuff you have to do like parsing. On the other hand, there are a lot of books and articles that focus on parsing as if it were the most important part.

Someone has already mentioned Structure and Interpretation of Computer Programs[4], additionally, there is the Essentials of Programming Languages[5], Paul Wilson's An Introduction to Scheme and its Implementation[6], and some things I haven't read, like Simon Peyton Jones and David Lester's Implementing functional languages: a tutorial[7].

[1] http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_H...

[2] http://norvig.com/lispy.html

[3] http://norvig.com/lispy2.html

[4] http://mitpress.mit.edu/sicp/full-text/book/book.html

[5] http://www.eopl3.com/

[6] ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_toc.html

[7] http://research.microsoft.com/en-us/um/people/simonpj/Papers...


I don't really have a good answer as to how much experience is needed, but I do know I wish I had attempted it earlier.

My suggestion: Just try and see how it goes. If you find it too overwhelming, don't worry, you can always revisit the concepts later.

I didn't write the tutorial with novice programmers in mind, so I can't promise everything will be explained as you need. But still, if you do try, I'd love to hear your experiences.


Are you the author? If you email me, if I get some free time and get going on this, I'll happily keep you informed. Also, looks like I'll be building a Lisp? That's pretty exciting, since I've wanted to try to learn one for a while now, but was always scared off by the syntax.


Yes, I'm the author. You'll find my contact info at the frontpage of the blog. Please, send my any feedback :)

The language is a simplified variant of Lisp. This is just a toy language, so a lot of things from a real Lisp will be missing, but I think there's enough to give you a sense of the core of the language. Actually, this version isn't too far off from the original Lisp written by John McCarthy over 55 years ago.


It's definitely a journey! And I agree that trying to figure out how the concepts for yourself is by far the most rewarding. I hope this tutorial can help people realize that it's not difficult to get started, and inspire them to get started exploring on their own.


Nice work. There's a similar article which uses Haskell as the base language instead of Python, which is also worth reading (this is actually how I learned both Haskell and Scheme..!)

https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_...


There is also SICP Chapter 4: Metalinguistic Abstraction. More precisely, 4.1 The Metacircular Evaluator [0]. Chapter 5 is about compiling to a toy register machine, as well.

I'm a big fan of the approach taking in Programming Languages: Application and Interpretation (PLAI). The Brown course from 2012 has video lectures available [2]. This one uses what is very close to Racket as the implementation language. It's typed, and called plai-typed.

The course these days uses the later chapters of Programming and Programming Languages (PAPL). This uses a new language, Pyret [3]. It's a bit more clear, due to the implementation language and source language having obviously different syntax, compared to plai-typed vs. the s-expression based language implemented.

[0] https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-26.htm...

[1] http://cs.brown.edu/courses/cs173/2012/book/

[2] http://cs.brown.edu/courses/cs173/2012/Videos/

[3] http://www.pyret.org/


Nice! I've actually just started learning me some Haskell (for great good). Will definitely have a look at that wikibook.


During my project I was once at the point where I actually implemented my own (interpreted) scripting language. I have never thought that I'd be one of the guys who creates something like this, but once you are so fed up with something over the years (and I was with ESQL just for being "closed source") you'd just may be starting to go for it and just try out an dig into it.

And it works. I am now saying things like "If I'd need a JIT compiler for it, I'd write one. I'll look at everything and evaluate if possible and how to be done."

Inventing wheels is great for learning and I am now back to Java (good enough ;)


Yet another LISP-like language parser tutorial. Just like Learning ??? Programming in ?? days. This is just the beginning, guys. LISP-like languages are not always great. (I'm not to look down this tutorial! Really!) I prefer imperative programming language parser tutorial. Every time I see such title, I will think of LISP.


If you want an imperative tutorial then the Crenshaw tutorial [1] or Nicklaus Wirth's book Compiler Construction[2] are great starting points.

[1] http://compilers.iecc.com/crenshaw/

[2] http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf


Thank you. I have read Crenshaw's tutorial before, and I think I should get some time to translate it with modern language with unicode support to make more fun. :)


I'm admittedly nitpicking here, but I think "(non-)imperative" is not a useful label for characterizing Lisp-ish languages. Some Lisps, like Clojure, are rather functional; others, like Common Lisp, are considered imperative.

For the distinction you're looking for, I'd like to suggest "C-like" or "Algol-family" instead for those languages which work with sequences, conditionals, loops and probably braces and semicolons.


My rule of thumb when doing a new language: Write the debugger first, you're going to need it anyway....


Bah! If you need a debugger, you're not thinking hard enough.


Debuggers are crutches for losers. I use an oscilloscope to debug.


This is great, but I want the other things that are important but forgotten:

" We will not have:

A proper type system Error handling Good performance And much, much more "

Good performance is something that have a lot of literature, so can be avoided in this tutorials.

I have been in this talk at lambda:

http://lambda-the-ultimate.org/node/4929#comment-79628

Where is discussed where to talk about implement compilers. I ask about the other things that bother me:

- Debugger/ Debugging experience - REPL - Tracing (ie: Dtrace or similar)

and other things that I wish to understand before just make another parser.

How build a "hacker news" for it? Exist some project I could use?


This is nice, but I was hoping to find guidelines about designing the syntax for a new programming language. As others are saying, s-expression syntax is simple to parse because it's basically already structured like the AST. But s-expression syntax isn't so easy to develop in, especially for non-developers.

Why non-developers? I'm interested in DSLs that are used to add plugin-type scripting capabilities, so that advanced non-developer users can extend behavior on their own. Think Lua, but more specific to the domain. As far as syntax, I'm thinking about something more like BASIC for its simplicity.


> This is nice, but I was hoping to find guidelines about designing the syntax for a new programming language. As others are saying, s-expression syntax is simple to parse because it's basically already structured like the AST. But s-expression syntax isn't so easy to develop in, especially for non-developers.

IME, the "especially for non-developers" part doesn't seem to be true. People who are exposed to s-expression syntax first don't seem to be any more likely to have problems with that syntax style than those introduced to programming with other syntaxes are with whatever syntax the are introduced to first; the people that are most put-off by s-expressions as a syntax of expressing programs are people who are attached to some other syntax first, and particularly people whose experience is with multiple languages all from the same syntax family, usually the ALGOL-style family.


Take a look at the Shunting Yard algorithm -- this will go from standard (infix) to Postfix notation. And Postfix (RPN) is actually easier to interpret than Prefix (the main loop is about a half dozen lines or so).


I would recommend the Racket tutorial, especially, the section on using Racket's macro system to implement a slight variation of Algol 60. Working through this tutorial would give you a foundation on building your own DSLs, as well as giving you a solid foundation on using Racket and its macro system itself.


Could you please link to this? I didn't find anything when searching for the relevant keywords, or see anything of the sort at http://docs.racket-lang.org/.


I found the starting point here "http://docs.racket-lang.org/algol60/index.html".


I found that as well, but it is not what inetsee described in his post.

I was interested specifically in "using Racket's macro system to implement a slight variation of Algol 60", which is not what that section or the Racket guide section on macros [0] does.

It's not clear what was meant by "the Racket tutorial".

[0] http://docs.racket-lang.org/guide/macros.html


Handy reference for those embarking on creating a new PL:

http://colinm.org/language_checklist.html


While this is still amusing, and poignant, I don't believe the goal of the original link is to encourage people to try and compete with modern languages.

For those of us (like me) who learn best by trying things out, a PL project is a great way to wrap your head around some of the more abstract concepts of modern programming languages (like closures, as mentioned in the post).

Still, +1 for amusement.


Fun and useful tutorial, great work, Kjetil!


Thanks!


... 8. Find someone to do the most boring part - IDE - for you (most OSS languages).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: