Hutter Prize


The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 1 GB English text file. Specifically, the prize awards 5000 euros for each one percent improvement in the compressed size of the file enwik9, which is the larger of two files used in the Large Text Compression Benchmark; enwik9 is the first 1,000,000,000 characters of a specific version of English Wikipedia. The ongoing competition is organized by Hutter, Matt Mahoney, and Jim Bowery.

Goals

The goal of the Hutter Prize is to encourage research in artificial intelligence. The organizers believe that text compression and AI are equivalent problems. Hutter proved that the optimal behavior of a goal seeking agent in an unknown but computable environment is to guess at each step that the environment is probably controlled by one of the shortest programs consistent with all interaction so far. However, there is no general solution because Kolmogorov complexity is not computable. Hutter proved that in the restricted case where the environment is restricted to time t and space l, a solution can be computed in time O, which is still intractable.
The organizers further believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test. Thus, progress toward one goal represents progress toward the other. They argue that predicting which characters are most likely to occur next in a text sequence requires vast real-world knowledge. A text compressor must solve the same problem in order to assign the shortest codes to the most likely text sequences.

Rules

The contest is open-ended. It is open to everyone. To enter, a competitor must submit a compression program and a decompressor that decompresses to the file enwik9. It is also possible to submit a compressed file instead of the compression program. The total size of the compressed file and decompressor must not be larger than 99% of the previous prize winning entry. For each one percent improvement, the competitor wins 5,000 euros. The decompression program must also meet execution time and memory constraints, currently 100 hours on 1 core of a 3 GHz CPU with 10 GB memory. These constraints may be relaxed in the future.
Submissions must be published in order to allow independent verification. There is a 30-day waiting period for public comment before awarding a prize. The rules do not require the release of source code, unless such release is required by the code's license.

History

The prize was announced on August 6, 2006. On February 21, 2020 it was expanded by a factor of 10. The original prize baseline was 18,324,887 bytes, achieved by PAQ8F. The expanded prize baseline was 116MB.
On August 16, Rudi Cilibrasi submitted a modified version of PAQ8F called RAQ8G that added parenthesis modeling. However it failed to meet the 1% threshold.
On the same day, but a few hours later Dmitry Shkarin submitted a modified version of his DURILCA compressor called DURILCA 0.5h, which improved compression by 1.5%. However it was disqualified for using 1.75 GB of memory. The decision to disqualify was controversial because the memory limits were not clearly specified in the rules at the time.
On August 20, Alexander Ratushnyak submitted PAQ8HKCC, a modified version of PAQ8H, which improved compression by 2.6% over PAQ8F. He continued to improve the compression to 3.0% with PAQ8HP1 on August 21, 4% with PAQ8HP2 on August 28, 4.9% with PAQ8HP3 on September 3, 5.9% with PAQ8HP4 on September 10, and 5.9% with PAQ8HP5 on September 25. At that point he was declared the first winner of the Hutter prize, awarded 3416 euros, and the new baseline was set to 17,073,018 bytes.
Ratushnyak has since broken his record multiple times, becoming the second, third, and fourth winner of the Hutter prize.