Brainfuck


Brainfuck is an esoteric programming language created in 1993 by Urban Müller.
Notable for its extreme minimalism, the language consists of only eight simple commands and an instruction pointer. While it is fully Turing complete, it is not intended for practical use, but to challenge and amuse programmers. Brainfuck simply requires one to break commands into microscopic steps.
The language's name is a reference to the slang term :wikt:brainfuck|brainfuck, which refers to things so complicated or unusual that they exceed the limits of one's understanding.

History

In 1992, Urban Müller, a Swiss physics student, took over a small online archive for Amiga software. The archive grew more popular, and was soon mirrored around the world. Today, it is the world's largest Amiga archive, known as Aminet.
Müller designed Brainfuck with the goal of implementing it with the smallest possible compiler, inspired by the 1024-byte compiler for the FALSE programming language. Müller's original compiler was implemented in machine language and compiled to a binary with a size of 296 bytes. He uploaded the first Brainfuck compiler to Aminet in 1993. The program came with a "Readme" file, which briefly described the language, and challenged the reader "Who can program anything useful with it? :)". Müller also included an interpreter and some quite elaborate examples. A second version of the compiler used only 240 bytes.
As Aminet grew, the compiler became popular among the Amiga community, and in time it was implemented for other platforms. Several brainfuck compilers have been made smaller than 200 bytes -one is only 100 bytes.

P′′: Brainfuck's formal "parent language"

Except for its two I/O commands, Brainfuck is a minor variation of the formal programming language P′′ created by Corrado Böhm in 1964, which in turn is explicitly based on the Turing machine. In fact, using six symbols equivalent to the respective Brainfuck commands +, -, <, >, , Böhm provided an explicit program for each of the basic functions that together serve to compute any computable function. So the first "Brainfuck" programs appear in Böhm's 1964 paper - and they were programs sufficient to prove Turing completeness.

The Infinite Abacus: Brainfuck's "grand-parent" language

A version with explicit memory addressing rather without stack and a conditional jump was introduced by Joachim Lambek in 1961 under the name of the Infinite Abacus, consisting of an infinite number of cells and two instructions:
He proves the Infinite Abacus can compute any computable recursive function by programming Kleene set of basic μ-recursive function.
His machine was simulated by Melzac's machine modeling computation via arithmetic rather than logic mimicking a human operator moving pebbles on an abacus, hence the requirement that all numbers must be positive. Melzac, whose one instruction set computer is equivalent to an Infinite Abacus, gives programs for multiplication, gcd, nt prime number, representation in base b, sorting by magnitude, and shows how to simulate an arbitrary Turing machine.

Language design

The language consists of eight commands, listed below. A brainfuck program is a sequence of these commands, possibly interspersed with other characters. The commands are executed sequentially, with some exceptions: an instruction pointer begins at the first command, and each command it points to is executed, after which it normally moves forward to the next command. The program terminates when the instruction pointer moves past the last command.
The brainfuck language uses a simple machine model consisting of the program and instruction pointer, as well as an array of at least 30,000 byte cells initialized to zero; a movable data pointer ; and two streams of bytes for input and output.

Commands

The eight language commands each consist of a single character:
CharacterMeaning
>increment the data pointer.
<decrement the data pointer.
+increment the byte at the data pointer.
-decrement the byte at the data pointer.
.output the byte at the data pointer.
,accept one byte of input, storing its value in the byte at the data pointer.
command.
]if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching match as parentheses usually do: each and vice versa, the between the two.
Brainfuck programs can be translated into C using the following substitutions, assuming ptr is of type char* and has been initialized to point to an array of zeroed bytes:
brainfuck commandC equivalent
>
<
+
-
.
,

Examples

Adding two values

As a first, simple example, the following code snippet will add the current cell's value to the next cell: Each time the loop is executed, the current cell is decremented, the data pointer moves to [the right, that next cell is incremented, and the data pointer moves left again. This sequence is repeated until the starting cell is 0.


This can be incorporated into a simple addition program as follows:

++ Cell c0 = 2
> +++++ Cell c1 = 5
End your loops with the cell pointer on the loop counter
At this point our program has added 5 to 2 leaving 7 in c0 and 0 in c1
but we cannot output this value to the terminal since it is not ASCII encoded!
To display the ASCII character "7" we must add 48 to the value 7
48 = 6 * 8 so let's use another loop to help us!
++++ ++++ c1 = 8 and this will be our loop counter again
< +++ +++ Add 6 to c0
> - Subtract 1 from c1
<. Print out c0 which has the value 55 which translates to "7"!

Hello World!

The following program prints "Hello World!" and a newline to the screen:

This loop is an "initial comment loop", a simple way of adding a comment
to a BF program such that you don't have to worry about any command
characters. Any ".", ",", "+", "-", "<" and ">" characters are simply
ignored, the "" characters just have to be balanced. This
loop and the commands it contains are ignored because the current cell
defaults to a value of 0; the 0 value causes this loop to be skipped.
++++++++ Set Cell #0 to 8
>++++ Add 4 to Cell #1; this will always set Cell #1 to 4
Loop till Cell #1 is zero; number of iterations is 4
>+ Add 1 to Cell #2
>+ Add 1 to Cell #3
>- Subtract 1 from Cell #4
>>+ Add 1 to Cell #6
Move back to the first zero cell you find; this will
be Cell #1 which was cleared by the previous loop
<- Decrement the loop Counter in Cell #0
] Loop till Cell #0 is zero; number of iterations is 8
The result of this is:
Cell No : 0 1 2 3 4 5 6
Contents: 0 0 72 104 88 32 8
Pointer : ^
>>. Cell #2 has value 72 which is 'H'
>---. Subtract 3 from Cell #3 to get 101 which is 'e'
+++++++..+++. Likewise for 'llo' from Cell #3
>>. Cell #5 is 32 for the space
<-. Subtract 1 from Cell #4 for 87 to give a 'W'
<. Cell #3 was set to 'o' from the end of 'Hello'
+++.------.--------. Cell #3 for 'rl' and 'd'
>>+. Add 1 to Cell #5 gives us an exclamation point
>++. And finally a newline from Cell #6

For "readability", this code has been spread across many lines, and blanks and comments have been added. Brainfuck ignores all characters except the eight commands +-<>,. so no special syntax for comments is needed. The code could just as well have been written as:
++++++++>+>+>->>+<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

ROT13

This program enciphers its input with the ROT13 cipher. To do this, it must map characters A-M to N-Z, and vice versa. Also it must map a-m to n-z and vice versa. It must map all other characters to themselves; it reads characters one at a time and outputs their enciphered equivalents until it reads an EOF, at which point the program terminates.
The basic approach used is as follows. Calling the input character x, divide x-1 by 32, keeping quotient and remainder. Unless the quotient is 2 or 3, just output x, having kept a copy of it during the division. If the quotient is 2 or 3, divide the remainder by 13; if the quotient here is 0, output x+13; if 1, output x-13; if 2, output x.
Regarding the division algorithm, when dividing y by z to get a quotient q and remainder r, there is an outer loop which sets q and r first to the quotient and remainder of 1/z, then to those of 2/z, and so on; after it has executed y times, this outer loop terminates, leaving q and r set to the quotient and remainder of y/z. Within the loop, there is code to increment r and decrement y, which is usually sufficient; however, every zth time through the outer loop, it is necessary to zero r and increment q. This is done with a diminishing counter set to the divisor z; each time through the outer loop, this counter is decremented, and when it reaches zero, it is refilled by moving the value from r back into it.

-,+ Set up divisor for division loop

<+<- Increase copy and remainder / reduce divisor / Normal case: skip forward
<>+<-]>>+>] Special case: move remainder back to divisor and increase quotient
<<<<<- Decrement dividend
] End division loop
]>>>+ End skip loop; zero former divisor and reuse space for a flag
>--< Reduce divisor; Normal case: increase remainder
>>+>>] Special case: increase remainder / move it back to divisor / increase quotient
<<<<<- Decrease dividend
] End division loop
>> Add remainder back to divisor to get a useful 13
>>> Zero quotient and divisor if quotient was 2
]<<>> Zero divisor and subtract 13 from copy if quotient was 1
]<< Zero divisor and add 13 to copy if quotient was 0
] End outer skip loop was not 2 or 3)
< Clear remainder from first division if second division was skipped
<. Output ROT13ed character from copy and clear it
<-,+ Read next character
] End character reading loop

Portability issues

Partly because Urban Müller did not write a thorough language specification, the many subsequent brainfuck interpreters and compilers have come to use slightly different dialects of brainfuck.

Cell size

In the classic distribution, the cells are of 8-bit size, and this is still the most common size. However, to read non-textual data, a brainfuck program may need to distinguish an end-of-file condition from any possible byte value; thus 16-bit cells have also been used. Some implementations have used 32-bit cells, 64-bit cells, or bignum cells with practically unlimited range, but programs that use this extra range are likely to be slow, since storing the value into a cell requires time as a cell's value may only be changed by incrementing and decrementing.
In all these variants, the , and . commands still read and write data in bytes. In most of them, the cells wrap around, i.e. incrementing a cell which holds its maximal value will bring it to its minimal value and vice versa. The exceptions are implementations which are distant from the underlying hardware, implementations that use bignums, and implementations that try to enforce portability.
It is usually easy to write brainfuck programs that do not ever cause integer wraparound or overflow, and therefore don't depend on cell size. Generally this means avoiding increment of +255, or avoiding overstepping the boundaries of . For more details on integer wraparound, see the Integer overflow article.

Array size

In the classic distribution, the array has 30,000 cells, and the pointer begins at the leftmost cell. Even more cells are needed to store things like the millionth Fibonacci number, and the easiest way to make the language Turing complete is to make the array unlimited on the right.
A few implementations extend the array to the left as well; this is an uncommon feature, and therefore portable brainfuck programs do not depend on it.
When the pointer moves outside the bounds of the array, some implementations will give an error message, some will try to extend the array dynamically, some will not notice and will produce undefined behavior, and a few will move the pointer to the opposite end of the array. Some tradeoffs are involved: expanding the array dynamically to the right is the most user-friendly approach and is good for memory-hungry programs, but it carries a speed penalty. If a fixed-size array is used it is helpful to make it very large, or better yet let the user set the size. Giving an error message for bounds violations is very useful for debugging but even that carries a speed penalty unless it can be handled by the operating system's memory protections.

End-of-line code

Different operating systems use subtly different versions of ASCII. The most important difference is in the code used for the end of a line of text. MS-DOS and Microsoft Windows use a CRLF, i.e. a 13 followed by a 10, in most contexts. UNIX and its descendants and Amigas use just 10, and older Macs use just 13. It would be difficult if brainfuck programs had to be rewritten for different operating systems. However, a unified standard was easy to create. Urban Müller's compiler and his example programs use 10, on both input and output; so do a large majority of existing brainfuck programs; and 10 is also more convenient to use than CRLF. Thus, brainfuck implementations should make sure that brainfuck programs that assume newline = 10 will run properly; many do so, but some do not.
This assumption is also consistent with most of the world's sample code for C and other languages, in that they use '\n', or 10, for their newlines. On systems that use CRLF line endings, the C standard library transparently remaps "\n" to "\r\n" on output and "\r\n" to "\n" on input for streams not opened in binary mode.

End-of-file behavior

The behavior of the , command when an end-of-file condition has been encountered varies. Some implementations set the cell at the pointer to 0, some set it to the C constant EOF, some leave the cell's value unchanged. There is no real consensus; arguments for the three behaviors are as follows.
Setting the cell to 0 avoids the use of negative numbers, and makes it marginally more concise to write a loop that reads characters until EOF occurs. This is a language extension devised by Panu Kalliokoski.
Setting the cell to -1 allows EOF to be distinguished from any byte value, which is necessary for reading non-textual data; also, it is the behavior of the C translation of , given in Müller's readme file. However, it is not obvious that those C translations are to be taken as normative.
Leaving the cell's value unchanged is the behavior of Urban Müller's brainfuck compiler. This behavior can easily coexist with either of the others; for instance, a program that assumes EOF = 0 can set the cell to 0 before each , command, and will then work correctly on implementations that do either EOF = 0 or EOF = "no change". It is so easy to accommodate the "no change" behavior that any brainfuck programmer interested in portability should do so.

Frameworks

Tyler Holewinski developed a C#.NET Framework, , which by default runs brainfuck, but can also be used to derive various forms of the language, as well as add new commands, or modify the behavior of existing ones. BrainF.NET thereby allows for development of programs such as an IDE.

Derivatives

Many people have created brainfuck equivalents or brainfuck derivatives.
Some examples:
  • Pi, which maps brainfuck into errors in individual digits of Pi.
  • VerboseFuck, which looks like a traditional programming language, only what appears as parameters or expressions are actually parts of longer commands that cannot be altered.
  • DerpPlusPlus, in which the commands are replaced with words such as 'HERP', 'DERP', 'GIGITY', etc.
  • Ook!, which maps brainfuck's eight commands to two-word combinations of "Ook.", "Ook?", and "Ook!", jokingly designed to be "writable and readable by orang-utans" according to its creator, a reference to the orang-utan Librarian in the novels of Terry Pratchett.
  • Ternary, similar in concept to Ook! but instead consisting of permutations of the ASCII characters 0, 1, and 2.
  • BodyFuck, a BrainFuck implementation based on a gesture-controlled system so that programmer's movements are captured by a video camera and converted into the 8 possible characters.
  • OooWee, commands are variations of OooWee. Inspired from the Rick and Morty character Mr. PoopyButthole.
OWIKI.org. Text is available under the Creative Commons Attribution-ShareAlike License.