AIXI

AIXI is a theoretical mathematical formalism for artificial general intelligence.
It combines Solomonoff induction with sequential decision theory.
AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.
AIXI is a reinforcement learning agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis. In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.

Definition

AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment. The interaction proceeds in time steps, from to, where is the lifespan of the AIXI agent. At time step t, the agent chooses an action and executes it in the environment, and the environment responds with a "percept", which consists of an "observation" and a reward, distributed according to the conditional probability, where is the "history" of actions, observations and rewards. The environment is thus mathematically represented as a probability distribution over "percepts" which depend on the full history, so there is no Markov assumption. Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that is computable, that is, the observations and rewards received by the agent from the environment can be computed by some program, given the past actions of the AIXI agent.
The only goal of the AIXI agent is to maximise, that is, the sum of rewards from time step 1 to m.
The AIXI agent is associated with a stochastic policy, which is the function it uses to choose actions at every time step, where is the space of all possible actions that AIXI can take and is the space of all possible "percepts" that can be produced by the environment. The environment can also be thought of as a stochastic policy :, where the is the Kleene star operation.
In general, at time step , AIXI, having previously executed actions and having observed the history of percepts , chooses and executes in the environment the action,, defined as follows
or, using parentheses, to disambiguate the precedences
Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to time steps ahead, weighs each of them by the complexity of programs consistent with the agent's past that can generate that future, and then picks the action that maximises expected future rewards.
Let us break this definition down in order to attempt to fully understand it.
is the "percept" received by the AIXI agent at time step from the environment. Similarly, is the percept received by AIXI at time step .
is the sum of rewards from time step to time step, so AIXI needs to look into the future to choose its action at time step.
denotes a monotone universal Turing machine, and ranges over all programs on the universal machine, which receives as input the program and the sequence of actions , and produces the sequence of percepts. The universal Turing machine is thus used to "simulate" or compute the environment responses or percepts, given the program and all actions of the AIXI agent: in this sense, the environment is "computable". Note that, in general, the program which "models" the current and actual environment is unknown because the current environment is also unknown.
is the length of the program . Note that. Hence, in the definition above, should be interpreted as a mixture over all computable environments, each weighted by its complexity. Note that can also be written as, and is the sequence of actions already executed in the environment by the AIXI agent. Similarly,, and is the sequence of percepts produced by the environment so far.
Let us now put all these components together in order to understand this equation or definition.
At time step t, AIXI chooses the action where the function attains its maximum.

Parameters

The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen. The latter parameter can be removed by the use of discounting.

The meaning of the word AIXI

According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by , or e.g. it can stand for AI "crossed" with induction. There are other interpretations.

Optimality

AIXI's performance is measured by the expected total number of rewards it receives.
AIXI has been proven to be optimal in the following ways.

Pareto optimality: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.
Balanced Pareto optimality: Like Pareto optimality, but considering a weighted sum of environments.
Self-optimizing: a policy p is called self-optimizing for an environment if the performance of p approaches the theoretical maximum for when the length of the agent's lifetime goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.

It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.
However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable. Since AIXI is incomputable, it assigns zero probability to its own existence.

Computational aspects

Like Solomonoff induction, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent. Another approximation to AIXI with a restricted environment class is MC-AIXI , which has had some success playing simple games such as partially observable Pac-Man.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...