Computational Recreations

Esoteric Languages

Being a computer hobbyist and not knowing how to program are facts that do not blend well together. A true hobbyist will know at least to make some programs in some computer language, be it Java, Basic, Pascal, C or any other. There's also some kind of a fashion thing among computer languages, which is logical since computers evolve.

However, it's not easy to get the picture of the immense variety of existing computer languages. Off the top of my head, I can remember right now COBOL, ALGOL, APL, BASIC, RPG, Forth, Fortran, Lisp, PL/I, Logo, C, Pascal, Perl, Python, TCL, Ada, Java, assembler... and I've deliberately omitted mentioning variants of most, like Visual Basic, RPG III, Scheme (a Lisp derivative), C++ or Delphi.

When it comes to assembler, the matter of variants becomes a madness. Every microprocessor model has its own assembler, and there are thousands of models: Z80, PIC-16, 8051, 8086, 68000, 6809, 6502... In some cases like that of Intel's 80x86, there are various, even a multitude of dialects: Intel's official dialect, GNU project official dialect (AT&T format), the free NASM assembler dialect, and so on.

Things become even worse when we consider that there are people inventing their own assembler language for a non-existing processor. The most remarkable case may be that of Donald E. Knuth, mathematician and writer of a worldwide famous computer-related series of books. There are simulators for both imaginary processors he's invented, MIX and MMIX.

But there's more. There are people that don't have a reason to invent a new assembly language; they just invent a new language. The reasons may vary; it's clear for example that Sun Microsystems had powerful reasons to design Java. But not everyone needs such powerful reasons; the object of this article is precisely to analyze some of the languages that have been invented just for fun. Some of you may wonder if there are people with so much spare time as to waste a part of it in this way. The answer is yes, and there are a number. Not less surprising is the fact that there exist interpreters for most of these languages, and in some cases even compilers.

One of the first esoteric languages written (at least about which I have knowledge), created in 1972, is INTERCAL. Its design goal was to create a computer language totally different from every other existing language, not forgetting the sense of humor. In this language we have to beg for the execution of some sentences using PLEASE; instead of using the famous GO TO for skipping to a different part of the program, we have to use COME FROM in the destination location, though this is actually an extension to the first specification. We can ask it to abstain from executing certain sentences, e.g. PLEASE ABSTAIN FROM CALCULATING. The chapter about operator precedence is almost empty, except for a sentence telling us that it's intentionally blank, and a footnote to recall that the aim in designing INTERCAL was to have no precedents. As a result, in order to perform two operations in sequence an equivalent to parentheses must be used. The full name of the language is, according to its authors, "Compiler Language With No Pronounceable Acronym", which is, "for obvious reasons", abbreviated INTERCAL. We can see in figure 1 what an INTERCAL program looks like. The purpose of this program is to read two integers, taking them as signed integers in twos complement format, and to write their absolute values.

Figure 1: INTERCAL sample program.
        DO (5) NEXT
    (5) DO FORGET #1
        PLEASE WRITE IN :1
        DO .1 <- 'V":1~'#32768$#0'"$#1'~#3
        DO (1) NEXT
        DO :1 <- "'V":1~'#65535$#0'"$#65535'
                ~'#0$#65535'"$"'V":1~'#0$#65535'"
                $#65535'~'#0$#65535'"
        DO :2 <- #1
        PLEASE DO (4) NEXT
    (4) DO FORGET #1
        DO .1 <- "'V":1~'#65535$#0'"$":2~'#65535
                $#0'"'~'#0$#65535'"$"'V":1~'#0
                $#65535'"$":2~'#65535$#0'"'~'#0$#65535'"
        DO (1) NEXT
        DO :2 <- ":2~'#0$#65535'"
                $"'":2~'#65535$#0'"$#0'~'#32767$#1'"
        DO (4) NEXT
    (2) DO RESUME .1
    (1) PLEASE DO (2) NEXT
        PLEASE FORGET #1
        DO READ OUT :1
        PLEASE DO .1 <- 'V"':1~:1'~#1"$#1'~#3
        DO (3) NEXT
        PLEASE DO (5) NEXT
    (3) DO (2) NEXT
        PLEASE GIVE UP

INTERCAL programs are obviously difficult to write. It's not, however, the esoteric language in which it's hardest to write a program. For instance, in the Piet language it's impossible to write a program: it must be drawn. Figure 2 shows what a Piet program looks like (zoomed).

Figure 2: Piet sample program, reproduced with permission.
Figure 2: Piet sample program

The program in figure 2 just prints the well-known "Hello, world!". We've admittedly abused, however, the term write in this context. Designing a Piet program is not such a hard task. Whoever looks for something that makes it really difficult to code may give Malbolge a try. The name comes from Dante's eight circle of hell; that should give an idea of how hard is to write a program in this language. We'll get a closer view knowing how difficult it was for Andrew Cooke to write a Malbolge program that simply writes "Hello, World" (he actually made it using trial-and-error like methods, and he even couldn't make it respect upper and lower case letters, not to mention punctuation signs: the program just wrote HEllO WORld.) Figure 3 shows the program.

Figure 3: Malbolge sample program.
(=<`$9]7<5YXz7wT.3,+O/o'K%$H"'~D|#z@b=`{^Lx8%$Xmrkpohm-kNi;g
sedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543s+O<oLm

Don't worry if you don't understanding anything. That's just the intention of the inventor of the language. In fact, it looks cryptic just because it's encrypted; that's the reason why Cooke proposes a "normalized Malbolge", meaning what is left after the decryption phase. That does not mean that the language is much easier after that, however; it's possible (if at all) to write a program in normalized Malbolge and apply the final encryption phase later. Figure 4 shows the aspect of the hello program in normalized Malbolge.

Figure 4: Normalized Malbolge sample program.
jpp<jp<pop<<jo*<popp<o*p<pp<pop<pop<jijoj/o<vvjpopoopo<ojo/o
vooooooooooooooooooooooooooooooooooooooooooooooooooo*p<v*<*

Even if it looks more readable than before, it's still complicated to understand without a description of what each symbol means, and even with it. The distribution of Malbolge comes with a full description of the language. Furthermore, in normalized Malbolge the meaning of the data is altered so normalization isn't a panacea either.

Amongst all esoteric languages there is one that is remarkable because of its exquisite simplicity and power. It's called Brainfuck; the impolite name complicates the search for references by searching the WWW since it's written in many variations that tend to hide its offensive character: Brainf*ck, Brainf***, Brainfork, Brainfsck, BF, and so on. Its creator's intention was to construct a language for which a very short compiler could be written while still being possible to perform any programming task with it.

Despite the name, writing a Brainfuck program is not so hard as it looks. It's been proved that Brainfuck is Turing-complete, which means that it's possible to write any algorithm with it. There are just eight instructions, each associated to a symbol. There's a description of each in table 1. There is a program pointer pointing to the instruction that is about to be executed, like in other imperative languages (well, there are some exceptions), and a data pointer which is used to manipulate data in Brainfuck. The data pointer points initially to position zero of a memory area, initially set to all zeros. A program is not considered to be correct if it tries to decrease the data pointer when it is at position zero. The usual convention is that every memory position is an unsigned byte (an integer between 0 and 255.) When applying an increment operation to one of these bytes when its value was 255, it usually becomes 0 and when the operation is a decrement and its value was 0, it usually becomes 255. We're using the term "usually" because the actual behaviour is implementation dependent.

Table 1: Instructions in the Brainfuck language.
Instruction Description C Equivalent
- Decrement the contents of the cell the data pointer currently points to by one. --*p;
+ Increment the contents of the cell the data pointer currently points to by one. ++*p;
< Decrement the data pointer by one. It's considered an error to execute this instruction when the data pointer value is 0. --p;
> Increment the data pointer by one. ++p;
[ If the current cell pointed by the data pointer is zero, look for the corresponding "]" and place the instruction pointer right after it. "[" and "]" can be nested. If there's no balance between them the program is not considered to be correct. while(*p){
] Look for the previous corresponding "[" and set the program pointer to point to it. }
, Enter a character from standard input and put it in the cell the data pointer points to. *p=getchar();
. Write the ASCII code corresponding to the cell currently pointed to by the data pointer to standard output. putchar(*p);

Those who know C can use the approximate equivalents in table 1 to better understand the actions performed by each instruction. With such a simple structure, it's hardly surprising that writing a Brainfuck interpreter, or even a compiler, is a fairly simple coding exercise. However, before going into details with the interpreter we'll make an overall analysis of the language and some common constructs.

For instance, the sequence of instructions [-] is often used in Brainfuck programs; its purpose is to set the current memory cell to zero: if a is the current memory cell, the sequence is equivalent to the C sentence "while(a){a--;}" that obviously sets a to zero.

If we want to move the current value one position to the right (we use the convention that "left" means decreasing the data pointer and "right" increasing it, i.e. that the memory positions increase from left to right), we'll use the following fragment: [>+<-]. This code increases the byte to the right of the current position as many times as the value at the current position is decreased, until it becomes zero. As a result, the cell to the right of the current position is added the initial contents of the current cell, which becomes zero afterwards. Copying a value is a harder task and requires two basic operations. One of them is a modified version of the movement code; the modification is to make it move the result to two cells simultaneously. The code to do this is [>+>+<<-]. The last operation consists of moving the rightmost byte to the original position, but before doing so we have to move to the rightmost position using the following: >>. Now we're ready to move the byte with [<<+>>-]; if the final position of the data pointer is not important then it's all done, otherwise we should move to the required position now.

Putting it all together, if we have three consecutive cells with values a, b and c and the data pointer points to a, when we execute the following code:

  [>+>+<<-]>>[<<+>>-]

we'll end up having a+c, a+b and 0 in their places. If we want to ensure that b=c=0 before starting, we can always use >[-]>[-]<< at the start of the code.

Multiplying a number by a constant number is easy: it's enough to use the move code fragment, but instead of incrementing the rightmost cell just once we increment it as many times as the constant. For example, to multiply the current data element by 7 we can use this code: [>+++++++<-]; this leaves the result to the right of the current data element, which in turn becomes zero.

Multiiplying two variable numbers together requires somewhat more effort. The basic idea is to add the second one to the one on its right as many times as the first one specifies. Let's see how to do this. First we note that the copy operation can also be used for addition since when the data cell to the right of the current one is nonzero, it will be added the content of the current cell. So if we call c the copy operation, this code will solve the problem: [>c<-], that is, go to the right, add the value in there to the one on its right, then go to the left, decrease by one and repeat if non-zero. If the first operand is not to be lost then the first thing that has to be done is copying it to a safe place.

After expanding the c, the whole code will look like this:

  [>
    [>+>+<<-]
    >>
    [<<+>>-]
    <<
  <-]

Here's a simple example that prints the answer to Life, the Universe, and Everithing (according to Douglas Adams, a British comic writer). That answer turns out to be 42. The following program calculates the ASCII code of number 4 and prints it, then the ASCII code of number 2 and prints it: +++++++[>+++++++<-]>+++.--.. The ASCII code of the digit 4 is 52 = 7×7+3, so we first multiply 7×7, then we add 3 to the result and print it. Then we subtract 2 from that since the ASCII code of digit 2 is 50 = 52-2. We print it and we're done.

Readers who are interested in learning more about the Brainfuck language can consult the links section. We will focus now in how to implement an interpreter for it. This is a really simple task as we will see; the only part slightly complicated is keeping track of the correspondence between [ and ]. We need two byte arrays (we'll focus on the 8-bit version from now on): the program memory array which we'll call pm and the data array which we'll call dm. We'll also need two variables called p and d which will be the program and data pointer respectively.

Here's an scheme of how an algorithm for the interpreter could look like:

  Read the program into program memory, pm, storing the final position
    into a variable pmend which will indicate end-of-program.
  Set all elements in the data memory to zeros.
  Let p = 0, d = 0.
  Start of loop (1):
    If p = pmend, we reached end-of-program; go to label (4).
    If the instruction mp[p] is «-», let md[d] = md[d] - 1.
    If the instruction mp[p] is «+», let md[d] = md[d] + 1.
    If the instruction mp[p] is «<», let d = d - 1.
    If the instruction mp[p] is «>», let d = d + 1.
    If the instruction mp[p] is «,», read a character and store it in md[d].
    If the instruction mp[p] is «.», write the character in md[d].
    If the instruction mp[p] is «[» and md[d] is zero, do the following:
      Let level = 0.
      Start of loop (2):
        If mp[p] is «[», let level = level + 1.
        If mp[p] is «]», let level = level - 1.
        Let p = p + 1.
        If level = 0, go to start of loop (1).
      Go to start of loop (2).
    If the instruction mp[p] is «]», do the following:
      Let level = 1.
      Start of loop (3):
        Let p = p - 1.
        If mp[p] is «]», let level = level + 1.
        If mp[p] is «[», let level = level - 1.
        If level = 0, go to start of loop (1).
      Go to start of loop (3).
    Let p = p + 1.
  Go to start of loop (1).
  Label (4): end of program.

The interpreted described above assumes that the programs are correct, that is, that they have a correct balance between «[» and «]» and that the program never uses the instruction «<» when d is zero nor instruction «>» when d is at the last defined address. It may be also necessary to deal with integer wrapping on «+» and «-» operations; that depends on the language used for the implementation.

During development of a Brainfuck program it's easy to make mistakes so that it doesn't meet these requirements (improper [ ] balancing, etc.) so it's recommended to place the corresponding checks in the program: check that d is not zero before trying to decrement it and that it's not in the upper limit before trying to increment it; in the loops that look for matching «[» and «]», check that p does not reach zero before applying the decrement, or pmend - 1 before applying the increment. With these cautions the interpreter should be safe against any malformed Brainfuck program.

Here's a puzzle: what's the shortest Brainfuck program that can store in position 0 the value 111 (decimal) leaving the rest of the data memory with zeros? Please send your programs to parigalo@formauri.es. There are some solutions which are already published in the next article.

Links

About esoteric languages in general

The place in which it's easiest to find references about esoteric languages is no doubt the Esoteric Languages Webring Index:
http://b.webring.com/hub?ring=esolang

The experimental languages page par excellence, maintained by Chris Pressey, is Cat's Eye Technologies' page:
http://www.catseye.mb.ca/index.html
The most remarkable section is the one dedicated to esoteric topics in computer programming:
http://www.catseye.mb.ca/esoteric/

Eric S. Raymond is the author of an INTERCAL-to-C compiler called C-INTERCAL. In his homepage there's a very interesting section called The Retrocomputing Museum containing a selection of pearls of esoteric programming. Brainfuck has eight instructions but what's the minimum number of instructions a language must have to be Turing-complete? The answer is one! There are two examples in this page pushing the RISC concept one step beyond: OISC (One Instruction Set Computing) and URISC (Ultimate RISC).
Another pearl in that museum is kvikkalkul, a language pretendedly used by the Swedish army in submarine weapons guidance programs in the 50's but most likely a joke. This language is most remarkable for its definition of a number: a signed fixed-point quantity with no integer part (numbers range from -0.999... to 0.999...).
There we can also find MIXAL, Knuth's MIX assembly language mentioned in the article's body, and a Klingon programming language called var'aq. There are many other interesting things in these pages that the reader is invited to discover.
http://www.tuxedo.org/~esr/retro/

The Stupid Languages Encyclopedia contains a quite extensive list of esoteric languages, their characteristics, their creator's name and their homepage or reference page. Some of the links in here have been obtained through this page.
http://www.kraml.at/stupid/

Usenet newsgroups and mailing lists related to esoteric languages:
alt.lang.intercal (News)
comp.lang.misc (News)
Esoteric Languages mailing list
Esoteric Languages mailing list archives

Some esoteric languages

The Whenever language, whose instructions don't execute in any established order but random:
http://www.dangermouse.net/esoteric/whenever.html

The instructions in the Wierd language are changes in angle of strings of symbols. Here's the language specification:
http://www.catseye.mb.ca/esoteric/wierd/doc/wierdspec.txt

The Piet language mentioned in the article, whose instructions are colors:
http://www.dangermouse.net/esoteric/piet.html

A description of the Sartre nihilist language, another one with a good dose of humor sense. It's quite remarkable the definition of the IF conditional instruction.
http://www.catseye.mb.ca/esoteric/sartre/index.html

About Brainfuck

Here's the C source for an interpreter and debugger of Brainfuck written by the author of this article. The code should be very portable; it compiles perfectly with gcc under either Windows or Linux. The interpreter makes first an optimization pass over the code to run it faster. The debugger contains advanced functions that I've missed in other similar programs.
brfd101.zip (11,077 bytes)

A complete guide about the building blocks with which to construct Brainfuck programs:
http://home.planet.nl/~faase009/Ha_bf_intro.html

The formal proof for the Turing-completeness of Brainfuck:
http://home.planet.nl/~faase009/Ha_bf_Turing.html

A Brainfuck interpreter available online, written in JavaScript:
http://home.planet.nl/~faase009/Ha_bf_online.html

Brainfuck normalization proposals:
http://www.muppetlabs.com/~breadbox/bf/standards.html
and a funny parody on standards here:
http://esoteric.sange.fi/ENSI/brainfuck-1.0.txt

There are a number of programs written in Brainfuck around; we remark a program to determine whether a number is prime or not, several interpreters and compilers of Brainfuck written in Brainfuck and a program to calculate PI though this last one requires an extension of the language that is able to deal with 16-bit data cells instad of the more frequent 8-bit ones.

A site devoted to Brainfuck:
http://www.bf-hacks.org/

More programs written in Brainfuck. We remark the program sort.bf which reads from standard input, sorts the incoming text character by character using the bubblesort method and prints the result.
http://esoteric.sange.fi/brainfuck/bf-source/prog/

Other links indirectly related to esoteric languages

A collection of code snippets which print Hello, World! in a great variety of languages, including many esoteric ones (Brainfuck and INTERCAL are among them):
http://www.latech.edu/~acm/HelloWorld.shtml

There's a similar page dedicated to programs which print the 99 Bottles of Beer song lyrics by means of a countdown from 99 to zero. This is a programming exercise obviously more complicated than that of Hello, World! and serves to show how to perform simple loops and calculations in each language. Here's a page dedicated to the 99 bottles of beer lyrics printing programs with more than 600 examples in different languages:
http://www.99-bottles-of-beer.net/

A quine is a program which is able to self-reproduce. It has nothing to do with viruses, though: unlike viruses, the requisite that a quine has to meet is that the program's output be its own source code, but without accessing it somehow (e.g. a BASIC program which uses the LIST command or a C program which opens the file with the source code are considered cheating). There are INTERCAL and Brainfuck quines, among many others.
http://www.nyx.net/~gthompso/quine.htm

Extensive list of programming languages, old and new (very long):
http://ftp.wustl.edu/doc/misc/lang-list.txt

The Gallery of CSS Descramblers is a page created to defend that a computer program can be considered free speech, since apparently an U.S. judge declared illegal a program written in C that descrambled the encrypted sectors from DVD discs, which use a cryptographic system called CSS. There are examples in many other languages (of course, Brainfuck is among them) and many other ways to express the original program that can be considered free speech: an MP3 file with a voice reading the program, detailed algorithm descriptions that can be used to redo it, and many other funny expressions of the DeCSS. Some of the versions are very twisted, like the one which speculates about the existence of illegal prime numbers.
http://www.cs.cmu.edu/~dst/DeCSS/Gallery/

Revision history

Copyright © 2002, 2003, 2005 Pedro Gimeno Fortea.

Special thanks must go to María Dolores Cárdenas for her help and support every time. The aesthetics in this page is due to her, as well as so many other details.

This page aims to be valid XHTML