Being a computer hobbyist and not knowing how to program are facts that do not blend well together. A true hobbyist will know at least to make some programs in some computer language, be it Java, Basic, Pascal, C or any other. There's also some kind of a fashion thing among computer languages, which is logical since computers evolve.
However, it's not easy to get the picture of the immense variety of existing computer languages. Off the top of my head, I can remember right now COBOL, ALGOL, APL, BASIC, RPG, Forth, Fortran, Lisp, PL/I, Logo, C, Pascal, Perl, Python, TCL, Ada, Java, assembler... and I've deliberately omitted mentioning variants of most, like Visual Basic, RPG III, Scheme (a Lisp derivative), C++ or Delphi.
When it comes to assembler, the matter of variants becomes a madness. Every microprocessor model has its own assembler, and there are thousands of models: Z80, PIC-16, 8051, 8086, 68000, 6809, 6502... In some cases like that of Intel's 80x86, there are various, even a multitude of dialects: Intel's official dialect, GNU project official dialect (AT&T format), the free NASM assembler dialect, and so on.
Things become even worse when we consider that there are people inventing their own assembler language for a non-existing processor. The most remarkable case may be that of Donald E. Knuth, mathematician and writer of a worldwide famous computer-related series of books. There are simulators for both imaginary processors he's invented, MIX and MMIX.
But there's more. There are people that don't have a reason to invent a new assembly language; they just invent a new language. The reasons may vary; it's clear for example that Sun Microsystems had powerful reasons to design Java. But not everyone needs such powerful reasons; the object of this article is precisely to analyze some of the languages that have been invented just for fun. Some of you may wonder if there are people with so much spare time as to waste a part of it in this way. The answer is yes, and there are a number. Not less surprising is the fact that there exist interpreters for most of these languages, and in some cases even compilers.
One of the first esoteric languages written (at least about
which I have knowledge), created in 1972, is INTERCAL. Its design goal was to create a computer
language totally different from every other existing language, not
forgetting the sense of humor. In this language we have to beg
for the execution of some sentences using PLEASE;
instead of using the famous GO TO
for skipping to a different part of the program, we have to use
COME FROM in the destination location, though this is
actually an extension to the first specification. We can ask it to
abstain from executing certain sentences, e.g. PLEASE ABSTAIN
FROM CALCULATING. The chapter about operator precedence
is almost empty, except for a sentence telling us that it's intentionally
blank, and a footnote to recall that the aim in designing INTERCAL
was to have no precedents. As a result, in order to
perform two operations in sequence an equivalent to parentheses
must be used. The full name of the language is, according to its
authors, "Compiler Language With No Pronounceable Acronym",
which is, "for obvious reasons", abbreviated
INTERCAL. We can see in
figure 1 what an INTERCAL program looks like. The purpose of this program
is to read two integers, taking them as signed integers in twos
complement format, and to write their absolute values.
DO (5) NEXT
(5) DO FORGET #1
PLEASE WRITE IN :1
DO .1 <- 'V":1~'#32768$#0'"$#1'~#3
DO (1) NEXT
DO :1 <- "'V":1~'#65535$#0'"$#65535'
~'#0$#65535'"$"'V":1~'#0$#65535'"
$#65535'~'#0$#65535'"
DO :2 <- #1
PLEASE DO (4) NEXT
(4) DO FORGET #1
DO .1 <- "'V":1~'#65535$#0'"$":2~'#65535
$#0'"'~'#0$#65535'"$"'V":1~'#0
$#65535'"$":2~'#65535$#0'"'~'#0$#65535'"
DO (1) NEXT
DO :2 <- ":2~'#0$#65535'"
$"'":2~'#65535$#0'"$#0'~'#32767$#1'"
DO (4) NEXT
(2) DO RESUME .1
(1) PLEASE DO (2) NEXT
PLEASE FORGET #1
DO READ OUT :1
PLEASE DO .1 <- 'V"':1~:1'~#1"$#1'~#3
DO (3) NEXT
PLEASE DO (5) NEXT
(3) DO (2) NEXT
PLEASE GIVE UP
|
INTERCAL programs are obviously difficult to write. It's not, however, the esoteric language in which it's hardest to write a program. For instance, in the Piet language it's impossible to write a program: it must be drawn. Figure 2 shows what a Piet program looks like (zoomed).
The program in figure 2 just prints the well-known "Hello, world!". We've
admittedly abused, however, the term write in this
context. Designing a Piet program is not such a hard task. Whoever
looks for something that makes it really difficult to code may give
Malbolge a try. The name comes from Dante's eight circle of hell; that
should give an idea of how hard is to write a program in this
language. We'll get a closer view knowing how difficult it was
for Andrew Cooke to write a Malbolge program that simply writes "Hello,
World" (he actually made it using trial-and-error like methods,
and he even couldn't make it respect upper and lower case letters,
not to mention punctuation signs: the program just wrote HEllO
WORld.) Figure 3 shows
the program.
(=<`$9]7<5YXz7wT.3,+O/o'K%$H"'~D|#z@b=`{^Lx8%$Xmrkpohm-kNi;g
sedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543s+O<oLm |
Don't worry if you don't understanding anything. That's just the intention of the inventor of the language. In fact, it looks cryptic just because it's encrypted; that's the reason why Cooke proposes a "normalized Malbolge", meaning what is left after the decryption phase. That does not mean that the language is much easier after that, however; it's possible (if at all) to write a program in normalized Malbolge and apply the final encryption phase later. Figure 4 shows the aspect of the hello program in normalized Malbolge.
jpp<jp<pop<<jo*<popp<o*p<pp<pop<pop<jijoj/o<vvjpopoopo<ojo/o vooooooooooooooooooooooooooooooooooooooooooooooooooo*p<v*<* |
Even if it looks more readable than before, it's still complicated to understand without a description of what each symbol means, and even with it. The distribution of Malbolge comes with a full description of the language. Furthermore, in normalized Malbolge the meaning of the data is altered so normalization isn't a panacea either.
Amongst all esoteric languages there is one that is remarkable because of its exquisite simplicity and power. It's called Brainfuck; the impolite name complicates the search for references by searching the WWW since it's written in many variations that tend to hide its offensive character: Brainf*ck, Brainf***, Brainfork, Brainfsck, BF, and so on. Its creator's intention was to construct a language for which a very short compiler could be written while still being possible to perform any programming task with it.
Despite the name, writing a Brainfuck program is not so hard as it looks. It's been proved that Brainfuck is Turing-complete, which means that it's possible to write any algorithm with it. There are just eight instructions, each associated to a symbol. There's a description of each in table 1. There is a program pointer pointing to the instruction that is about to be executed, like in other imperative languages (well, there are some exceptions), and a data pointer which is used to manipulate data in Brainfuck. The data pointer points initially to position zero of a memory area, initially set to all zeros. A program is not considered to be correct if it tries to decrease the data pointer when it is at position zero. The usual convention is that every memory position is an unsigned byte (an integer between 0 and 255.) When applying an increment operation to one of these bytes when its value was 255, it usually becomes 0 and when the operation is a decrement and its value was 0, it usually becomes 255. We're using the term "usually" because the actual behaviour is implementation dependent.
| Instruction | Description | C Equivalent |
|---|---|---|
- |
Decrement the contents of the cell the data pointer currently points to by one. | --*p; |
+ |
Increment the contents of the cell the data pointer currently points to by one. | ++*p; |
< |
Decrement the data pointer by one. It's considered an error to execute this instruction when the data pointer value is 0. | --p; |
> |
Increment the data pointer by one. | ++p; |
[ |
If the current cell pointed by the data pointer
is zero, look for the corresponding "]" and
place the instruction pointer right after it.
"[" and "]" can be nested.
If there's no balance between them the program is not considered to be
correct. |
while(*p){ |
] |
Look for the previous corresponding
"[" and set the program pointer to point to it. |
} |
, |
Enter a character from standard input and put it in the cell the data pointer points to. | *p=getchar(); |
. |
Write the ASCII code corresponding to the cell currently pointed to by the data pointer to standard output. | putchar(*p); |
Those who know C can use the approximate equivalents in table 1 to better understand the actions performed by each instruction. With such a simple structure, it's hardly surprising that writing a Brainfuck interpreter, or even a compiler, is a fairly simple coding exercise. However, before going into details with the interpreter we'll make an overall analysis of the language and some common constructs.
For instance, the sequence of instructions [-] is often
used in Brainfuck programs; its purpose is to set the current memory cell
to zero: if a is the current memory cell, the sequence is
equivalent to the C sentence "while(a){a--;}" that
obviously sets a to zero.
If we want to move the current value one position to the right
(we use the convention that "left" means decreasing the
data pointer and "right" increasing it, i.e. that the
memory positions increase from left to right), we'll use the
following fragment: [>+<-]. This code
increases the byte to the right of the current position as many
times as the value at the current position is decreased, until it
becomes zero. As a result, the cell to the right of the current
position is added the initial contents of the current cell, which
becomes zero afterwards. Copying a value is a harder task and
requires two basic operations. One of them is a modified version
of the movement code; the modification is to make it move the
result to two cells simultaneously. The code to do this is
[>+>+<<-]. The last operation consists
of moving the rightmost byte to the original position, but before
doing so we have to move to the rightmost position using the
following: >>. Now we're ready to move the
byte with [<<+>>-]; if the final position
of the data pointer is not important then it's all done, otherwise
we should move to the required position now.
Putting it all together, if we have three consecutive cells with
values a, b and c and the
data pointer points to a, when we execute the
following code:
[>+>+<<-]>>[<<+>>-]
we'll end up having a+c, a+b and
0 in their places. If we want to ensure that
b=c=0 before starting, we can always use
>[-]>[-]<< at the start of the code.
Multiplying a number by a constant number is easy: it's enough to
use the move code fragment, but instead of incrementing the rightmost
cell just once we increment it as many times as the constant. For
example, to multiply the current data element by 7 we can use this code:
[>+++++++<-]; this leaves the result to the right
of the current data element, which in turn becomes zero.
Multiiplying two variable numbers together requires somewhat more
effort. The basic idea is to add the second one to the one on its
right as many times as the first one specifies. Let's see how to
do this. First we note that the copy operation can also be used
for addition since when the data cell to the right of the current
one is nonzero, it will be added the content of the current cell.
So if we call c the copy operation, this code will
solve the problem: [>c<-], that is, go to the
right, add the value in there to the one on its right, then go to
the left, decrease by one and repeat if non-zero. If the first
operand is not to be lost then the first thing that has to be
done is copying it to a safe place.
After expanding the c, the whole code will look
like this:
[>
[>+>+<<-]
>>
[<<+>>-]
<<
<-]
Here's a simple example that prints the answer to Life, the Universe, and Everithing (according to Douglas Adams, a British comic writer). That answer turns out to be 42. The
following program calculates the ASCII code of number 4 and prints
it, then the ASCII code of number 2 and prints it:
+++++++[>+++++++<-]>+++.--.. The ASCII code of the
digit 4 is 52 = 7×7+3, so we first multiply 7×7, then
we add 3 to the result and print it. Then we subtract 2 from that
since the ASCII code of digit 2 is 50 = 52-2. We print it and we're
done.
Readers who are interested in learning more about the Brainfuck
language can consult the links section. We will focus now in how to implement an interpreter for it.
This is a really simple task as we will see; the only part slightly
complicated is keeping track of the correspondence between
[ and ]. We need two byte arrays (we'll
focus on the 8-bit version from now on): the program memory array
which we'll call pm and the data array which we'll call
dm. We'll also need two variables called p
and d which will be the program and data pointer
respectively.
Here's an scheme of how an algorithm for the interpreter could look like:
Read the program into program memory, pm, storing the final position
into a variable pmend which will indicate end-of-program.
Set all elements in the data memory to zeros.
Let p = 0, d = 0.
Start of loop (1):
If p = pmend, we reached end-of-program; go to label (4).
If the instruction mp[p] is «-», let md[d] = md[d] - 1.
If the instruction mp[p] is «+», let md[d] = md[d] + 1.
If the instruction mp[p] is «<», let d = d - 1.
If the instruction mp[p] is «>», let d = d + 1.
If the instruction mp[p] is «,», read a character and store it in md[d].
If the instruction mp[p] is «.», write the character in md[d].
If the instruction mp[p] is «[» and md[d] is zero, do the following:
Let level = 0.
Start of loop (2):
If mp[p] is «[», let level = level + 1.
If mp[p] is «]», let level = level - 1.
Let p = p + 1.
If level = 0, go to start of loop (1).
Go to start of loop (2).
If the instruction mp[p] is «]», do the following:
Let level = 1.
Start of loop (3):
Let p = p - 1.
If mp[p] is «]», let level = level + 1.
If mp[p] is «[», let level = level - 1.
If level = 0, go to start of loop (1).
Go to start of loop (3).
Let p = p + 1.
Go to start of loop (1).
Label (4): end of program.
The interpreted described above assumes that the programs are correct,
that is, that they have a correct balance between
«[» and «]» and that
the program never uses the instruction «<»
when d is zero nor instruction «>»
when d is at the last defined address. It may be also
necessary to deal with integer wrapping on «+»
and «-» operations; that depends on the language
used for the implementation.
During development of a Brainfuck program it's easy to make mistakes
so that it doesn't meet these requirements (improper [ ]
balancing, etc.) so it's recommended to place the corresponding checks
in the program: check that d is not zero before trying
to decrement it and that it's not in the upper limit before trying to
increment it; in the loops that look for matching
«[» and «]», check
that p does not reach zero before applying the decrement,
or pmend - 1 before applying the increment. With these
cautions the interpreter should be safe against any malformed Brainfuck
program.
Here's a puzzle: what's the shortest Brainfuck program that can store in position 0 the value 111 (decimal) leaving the rest of the data memory with zeros? Please send your programs to parigalo@formauri.es. There are some solutions which are already published in the next article.
The place in which it's easiest to find references about esoteric
languages is no doubt the Esoteric Languages Webring Index:
http://b.webring.com/hub?ring=esolang
The experimental languages page par excellence, maintained by
Chris Pressey, is Cat's Eye Technologies' page:
http://www.catseye.mb.ca/index.html
The most remarkable section is the one dedicated to esoteric
topics in computer programming:
http://www.catseye.mb.ca/esoteric/
Eric S. Raymond is the author of an INTERCAL-to-C compiler called C-INTERCAL. In his
homepage there's a very interesting section called The Retrocomputing
Museum containing a selection of pearls of esoteric programming.
Brainfuck has eight instructions but what's the minimum number
of instructions a language must have to be Turing-complete? The answer
is one! There are two examples in this page pushing the RISC concept one
step beyond: OISC (One Instruction Set Computing) and URISC (Ultimate
RISC).
Another pearl in that museum is kvikkalkul, a language pretendedly used by the Swedish army
in submarine weapons guidance programs in the 50's but most likely a
joke. This language is most remarkable for its definition of a number:
a signed fixed-point quantity with no integer part (numbers range from
-0.999... to 0.999...).
There we can also find MIXAL, Knuth's MIX assembly language
mentioned in the article's body, and a Klingon programming language
called var'aq. There are many other interesting things in
these pages that the reader is invited to discover.
http://www.tuxedo.org/~esr/retro/
The Stupid Languages Encyclopedia contains a quite extensive list
of esoteric languages, their characteristics, their creator's name
and their homepage or reference page. Some of the links in here
have been obtained through this page.
http://www.kraml.at/stupid/
Usenet newsgroups and mailing lists related to esoteric languages:
alt.lang.intercal (News)
comp.lang.misc (News)
Esoteric Languages mailing list
Esoteric Languages mailing list archives
The Whenever language, whose instructions don't
execute in any established order but random:
http://www.dangermouse.net/esoteric/whenever.html
The instructions in the Wierd language are changes in
angle of strings of symbols. Here's the language specification:
http://www.catseye.mb.ca/esoteric/wierd/doc/wierdspec.txt
The Piet language mentioned in the article, whose
instructions are colors:
http://www.dangermouse.net/esoteric/piet.html
A description of the Sartre nihilist language, another
one with a good dose of humor sense. It's quite remarkable the
definition of the IF conditional instruction.
http://www.catseye.mb.ca/esoteric/sartre/index.html
Here's the C source for an interpreter and debugger of Brainfuck
written by the author of this article. The code should be very
portable; it compiles perfectly with gcc under either Windows or
Linux. The interpreter makes first an optimization pass over the
code to run it faster. The debugger contains advanced functions
that I've missed in other similar programs.
brfd101.zip (11,077 bytes)
A complete guide about the building blocks with which to
construct Brainfuck programs:
http://home.planet.nl/~faase009/Ha_bf_intro.html
The formal proof for the Turing-completeness of Brainfuck:
http://home.planet.nl/~faase009/Ha_bf_Turing.html
A Brainfuck interpreter available online, written in JavaScript:
http://home.planet.nl/~faase009/Ha_bf_online.html
Brainfuck normalization proposals:
http://www.muppetlabs.com/~breadbox/bf/standards.html
and a funny parody on standards here:
http://esoteric.sange.fi/ENSI/brainfuck-1.0.txt
There are a number of programs written in Brainfuck around; we remark a program to determine whether a number is prime or not, several interpreters and compilers of Brainfuck written in Brainfuck and a program to calculate PI though this last one requires an extension of the language that is able to deal with 16-bit data cells instad of the more frequent 8-bit ones.
A site devoted to Brainfuck:
http://www.bf-hacks.org/
More programs written in Brainfuck. We remark the program
sort.bf which reads from standard input, sorts the
incoming text character by character using the bubblesort
method and prints the result.
http://esoteric.sange.fi/brainfuck/bf-source/prog/
A collection of code snippets which print Hello, World!
in a great variety of languages, including many esoteric ones
(Brainfuck and INTERCAL are among them):
http://www.latech.edu/~acm/HelloWorld.shtml
There's a similar page dedicated to programs which print
the 99 Bottles of Beer song lyrics by means of a
countdown from 99 to zero. This is a programming exercise
obviously more complicated than that of Hello, World!
and serves to show how to perform simple loops and calculations
in each language. Here's a page dedicated to the 99 bottles
of beer lyrics printing programs with more than 600
examples in different languages:
http://www.99-bottles-of-beer.net/
A quine is a program which is able to self-reproduce.
It has nothing to do with viruses, though: unlike viruses, the
requisite that a quine has to meet is that the program's output
be its own source code, but without accessing it somehow (e.g.
a BASIC program which uses the LIST command or a C program which
opens the file with the source code are considered cheating).
There are INTERCAL and Brainfuck quines, among many others.
http://www.nyx.net/~gthompso/quine.htm
Extensive list of programming languages, old and new (very long):
http://ftp.wustl.edu/doc/misc/lang-list.txt
The Gallery of CSS Descramblers is a page created to defend
that a computer program can be considered free speech, since
apparently an U.S. judge declared illegal a program written in
C that descrambled the encrypted sectors from DVD discs, which
use a cryptographic system called CSS. There are examples in
many other languages (of course, Brainfuck is among them) and
many other ways to express the original program that can be
considered free speech: an MP3 file with a voice reading the
program, detailed algorithm descriptions that can be used to
redo it, and many other funny expressions of the DeCSS. Some
of the versions are very twisted, like the one which
speculates about the existence of illegal prime numbers.
http://www.cs.cmu.edu/~dst/DeCSS/Gallery/
Special thanks must go to María Dolores Cárdenas for her help and support every time. The aesthetics in this page is due to her, as well as so many other details.