MuZero

MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing anything about their rules. Its first release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atari games. The algorithm uses an approach similar to AlphaZero.
It matched AlphaZero's performance in chess and shogi, improved on its performance in Go, and improved on the state of the art in mastering a suite of 57 Atari games, a visually-complex domain.
MuZero was trained via self-play and play against AlphaZero, with no access to rules, opening books, or endgame tables. The trained algorithm used the same convolutional and residual algorithms as AlphaZero, but with 20% fewer computation steps per node in the search tree.

History

On November 19, 2019, the DeepMind team released a preprint introducing MuZero.

Derivation from AlphaZero

MuZero is a combination of the high-performance planning of the AlphaZero algorithm with approaches to model-free reinforcement learning. The combination allows for more efficient training in classical planning regimes, such as Go, while also handling domains with much more complex inputs at each stage, such as visual video games.
MuZero was derived directly from AZ code, and shares its rules for setting search hyperparameters. Differences between the approaches include:

AZ's planning process uses a simulator and a neural network. Perfect knowledge of game rules is used in modeling state transitions in the search tree, actions available at each node, and termination of a branch of the tree. MZ does not have access to a perfect ruleset, and replaces these two components with a single neural network, which is updated continually.
AZ has a single model of the state of the game; MZ has separate models for representation of the current state, dynamics of the current state, and prediction of how the policy and value functions will update after a move.
MZ's hidden model may be complex, and it may turn out it can cache computation in it; exploring the details of a hidden model in a successfully-trained instance of MZ is an avenue for future exploration
MZ doesn't expect a two-player game where winners take all. It works with standard reinforcement-learning scenarios, including single-agent environments with continuous intermediate rewards, possibly of arbitrary magnitude and with discounting over time. AZ was designed exclusively for two-player games that could be won, drawn, or lost.
Comparison with R2D2

The previous state of the art technique for learning to play the suite of Atari games was R2D2, the Recurrent Replay Distributed DQN.
MuZero surpassed both R2D2's mean and median performance across the suite of games, though it did not do better in every game.

Training and results

MuZero used 16 third-generation tensor processing units for training, and on 1000 TPUs for selfplay and 8 TPUs for training and 32 TPUs for selfplay.
AlphaZero used 64 first-generation TPUs for training, and 5000 second-generation TPUs for selfplay. As TPU design has improved, these are fairly comparable training setups.
R2D2 was trained for 5 days through 2M training steps.

Preliminary results

MuZero matched AlphaZero's performance in chess and Shogi after roughly 1 million training steps. It matched AZ's performance in Go after 500 thousand training steps, and surpassed it by 1 million steps. It matched R2D2's mean and media performance across the Atari game suite after 500 thousand training steps, and surpassed it by 1 million steps; though it never performed well on 6 games in the suite.

Reactions and related work

MuZero was viewed as a significant advancement over AlphaZero, and a generalizable step forward in unsupervised learning techniques. The work was seen as advancing understanding of how to compose systems from smaller components, a systems-level development more than a pure machine-learning development.
While only pseudocode was released by the development team, Werner Duvaud produced an open source implementation based on that.
MuZero has been used as a reference implementation in other work, for instance as a way to generate model-based behavior.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...