Bootstrapping (compilers)


In computer science, bootstrapping is the technique for producing a self-compiling compiler — that is, compiler written in the source programming language that it intends to compile. An initial core version of the compiler is generated in a different language ; successive expanded versions of the compiler are developed using this minimal subset of the language. The problem of compiling a self-compiling compiler has been called the chicken-or-egg problem in compiler design, and bootstrapping is a solution to this problem.
Many compilers for many programming languages are bootstrapped, including compilers for BASIC, ALGOL, C, C#, D, Pascal, PL/I, Factor, Haskell, Modula-2, Oberon, OCaml, Common Lisp, Scheme, Go, Java, Elixir, Rust, Python, Scala, Nim, Eiffel, and more.

Process

A typical bootstrap process work in three or four stages:
The full compiler is built twice in order to compare the outputs of the two stages. If they are different, either the bootstrap or the full compiler contains a bug.

Advantages

Bootstrapping a compiler has the following advantages:
Note that some of these points assume that the language runtime is also written in the same language.

Methods

If one needs to compile a compiler for language X written in language X, there is the issue of how the first compiler can be compiled. The different methods that are used in practice include:
Methods for distributing compilers in source code include providing a portable bytecode version of the compiler, so as to bootstrap the process of compiling the compiler with itself. The T-diagram is a notation used to explain these compiler bootstrap techniques. In some cases, the most convenient way to get a complicated compiler running on a system that has little or no software on it involves a series of ever more sophisticated assemblers and compilers.

History

Assemblers were the first language tools to bootstrap themselves.
The first high-level language to provide such a bootstrap was NELIAC in 1958. The first widely used languages to do so were Burroughs B5000 Algol in 1961 and LISP in 1962.
Hart and Levin wrote a LISP compiler in LISP at MIT in 1962, testing it inside an existing LISP interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science, such as the proof that the halting problem is undecidable.

Current efforts

Due to security concerns regarding the Trusting Trust Attack and various attacks against binary trustworthiness, multiple projects are working to reduce the effort for not only bootstrapping from source but also allowing everyone to verify that source and executable correspond. These include the Bootstrappable builds project and the Reproducible builds project.

List of languages having self-hosting compilers

The following programming languages have self-hosting compilers: