ESP game


The ESP game is a human-based computation game developed to address the problem of creating difficult metadata. The idea behind the game is to use the computational power of humans to perform a task that computers cannot by packaging the task as a game. It was originally conceived by Luis von Ahn of Carnegie Mellon University. Google bought a licence to create its own version of the game in 2006 in order to return better search results for its online images. The licence of the data acquired by Ahn's ESP game, or the Google version, is not clear. Google's version was shut down on September 16, 2011 as part of the Google Labs closure in September 2011.

Concept

is a task that is difficult for computers to perform independently. Humans are perfectly capable of it, but are not necessarily willing. By making the recognition task a "game", people are more likely to participate. When questioned about how much they enjoyed playing the game, collected data from users was extremely positive.
The applications and uses of having so many labeled images are significant; for example, more accurate image searching and accessibility for visually impaired users, by reading out an image's labels. Partnering two people to label images makes it more likely that entered words will be accurate. Since the only thing the two partners have in common is that they both see the same image, they must enter reasonable labels to have any chance of agreeing on one.
The ESP Game as it is currently implemented encourages players to assign “obvious” labels, which are most likely to lead to an agreement with the partner. But these labels can often be deduced from the labels already present using an appropriate language model and such labels therefore add only little information to the system. A Microsoft research project assigns probabilities to the next label to be added. This model is then used in a program, which plays the ESP game without looking at the image.
ESP-game authors presented evidence that the labels produced using the game were indeed useful descriptions of the images. The results of searching for randomly chosen keywords were presented and show that the proportion of appropriate images when searching using the labels generated by the game is extremely high. Further evaluation was achieved by comparing the labels generated using the game to labels generated by participants that were asked to describe the images.

Rules of the game

Once logged in, a user is automatically matched with a random partner. The partners do not know each other's identity and they cannot communicate. Once matched, they will both be shown the same image. Their task is to agree on a word that would be an appropriate label for the image. They both enter possible words, and once a word is entered by both partners, that word is agreed upon, and that word becomes a label for the image. Once they agree on a word, they are shown another image. They have two and a half minutes to label 15 images.
Both partners have the option to pass; that is, give up on an image. Once one partner passes, the other partner is shown a message that their partner wishes to pass. Both partners must pass for a new image to be shown.
Some images have “taboo” words; that is, words that cannot be entered as possible labels. These words are usually related to the image and make the game harder as they prevent common words to be used to label the image. Taboo words are obtained from the game itself. The first time an image is used in the game, it will have no taboo words. If the image is ever used again, it will have one taboo word: the word that resulted from the previous agreement. The next time the image is used, it will have two taboo words, and so on. “Taboo” words is done automatically by the system: once an image has been labeled enough times with the same word, that word becomes taboo so that the image will get a variety of different words as labels.
Occasionally, the game will be played solo, without a human partner, with the ESP Game itself acting as the opponent and delivering a series of pre-determined labels to the single human player. This is necessary if there are an odd number of people playing the game.
In late 2008, the game was rebranded as GWAP, with a new user interface. Some other games that were also created by Luis von Ahn, such as “Peekaboom” and “Phetch”, were discontinued at that point.

Cheating

Ahn has described countermeasures which prevent players from "cheating" the game, and introducing false data into the system. By giving players occasional test images for which common labels are known, it is possible to check that players are answering honestly, and a player's guesses are only stored if they successfully label the test images.
Furthermore, a label is only stored after a certain number of players have agreed on it. At this point, all of the taboo lists for the images are deleted and the image is returned to the game pool as if it were a fresh image. If X is the probability of a label being incorrect despite a player having successfully labelled test images, then after N repetitions the probability of corruption is, assuming that end repetitions are independent of each other.

Image selection

The choice of images used by the ESP game makes a difference in the player's experience. The game would be less entertaining if all the images were chosen from a single site and were all extremely similar.
The first run of the ESP game used a collection of 350,000 images chosen by the developers. Later versions selected images at random from the web, using a small amount of filtering. Such images are reintroduced into the game several times until they are fully labeled. The random images were chosen using "Random Bounce Me", a website that selects a page at random from the Google database. “Random Bounce Me” was queried repeatedly, each time collecting all JPEG and GIF images in the random page, except for images that did not fit the criteria: blank images, images that consist of a single color, images that are smaller than 20 pixels on either dimension, and images with an aspect ratio greater than 4.5 or smaller than 1/4.5. This process was repeated until 350,000 images were collected. The images were then rescaled to fit the game's display. Fifteen different images from the 350,000 are chosen for each session of the game.