Self-hosting (compilers)


In computer programming, self-hosting is the use of a program as part of the toolchain or operating system that produces new versions of that same program—for example, a that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, command-line interpreters and revision control software.
If a system is so new that no software has been written for it, then software is developed on another self-hosting system, often using a cross compiler, and placed on a storage device that the new system can read. Development continues this way until the new system can reliably host its own development. Writing new software development tools without using another host system is rare.
In the context of website management and online publishing, the term "self hosting" is used to describe the practice of running and maintaining a website using a private web server.

History

The first self-hosting compiler was written for Lisp by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
This technique is usually only practicable when an interpreter already exists for the very same language that is to be compiled; though possible, it is extremely uncommon to humanly compile a compiler with itself. The concept borrows directly from and is an example of the broader notion of running a program on itself as input, used also in various proofs in theoretical computer science, such as the proof that the halting problem is undecidable.

Examples

started development on Unix in 1968 by writing and compiling programs on the GE-635 and carrying them over to the PDP-7 for testing. After the initial Unix kernel, a command interpreter, an editor, an assembler, and a few utilities were completed, the Unix operating system was self-hosting - programs could be written and tested on the PDP-7 itself.
Douglas McIlroy wrote TMG in TMG on a piece of paper and "decided to give his piece of paper to his piece of paper," doing the computation himself, thus compiling a TMG compiler into assembly, which he typed up and assembled on Ken Thompson's PDP-7.
Development of the GNU system relies largely on GCC and GNU Emacs, making possible the self contained, maintained and sustained development of free software for the GNU Project.
Many programming languages have self-hosted implementations: compilers that are both in and for the same language. Such languages include Ada, BASIC, C, C++, C#, ClojureScript, CoffeeScript, Crystal, Dylan, F#, FASM, Forth, Gambas, Go, Haskell, HolyC, Java, Lisp, Modula-2, OCaml, Oberon, Pascal, Python, Rust, Scala, Smalltalk, TypeScript, Vala, and Visual Basic.
In some of these cases, the initial implementation was not self-hosted, but rather, written in another language ; in other cases, the initial implementation was developed using bootstrapping.