Winograd Schema Challenge

The Winograd Schema Challenge is a test of machine intelligence proposed by Hector Levesque, a computer scientist at the University of Toronto. Designed to be an improvement on the Turing test, it is a multiple-choice test that employs questions of a very specific structure: they are instances of what are called Winograd Schemas, named after Terry Winograd, professor of computer science at Stanford University.
On the surface, Winograd Schema questions simply require the resolution of anaphora: the machine must identify the antecedent of an ambiguous pronoun in a statement. This makes it a task of natural language processing, but Levesque argues that for Winograd Schemas, the task requires the use of knowledge and commonsense reasoning.
Nuance Communications announced in July 2014 that it would sponsor an annual WSC competition, with a prize of $25,000 for the best system that could match human performance. However, the prize is no longer offered.

Background

The Winograd Schema Challenge was proposed in the spirit of the Turing test. Proposed by Alan Turing in 1950, the Turing test plays a central role in the philosophy of artificial intelligence. Turing proposed that instead of debating what intelligence is, the science of AI should be concerned with demonstrating intelligent behavior, which can be tested. But the exact nature of the test Turing proposed has come under scrutiny, especially since an AI chat bot named Eugene claimed to pass it in 2014. The Winograd Schema Challenge was proposed in part to ameliorate the problems that came to light with the nature of the programs that performed well on the test.
Turing's original proposal was what he called the Imitation Game, which involves free-flowing, unrestricted conversations in English between human judges and computer programs over a text-only channel. In general, the machine passes the test if interrogators are not able to tell the difference between it and a human in a five-minute conversation.

Eugene Goostman

On June 7, 2014, a computer program named Eugene Goostman was declared to be the first AI to have passed the Turing test in a competition held by the University of Reading in England. In the competition Eugene was able to convince 33% of judges that they were talking with a 13-year-old Ukrainian boy. The supposed victory of a machine that thinks aroused controversies about the Turing test. Critics claimed that Eugene passed the test simply by fooling the judge and taking advantages of its purported identity. For example, it could easily skip some key questions by joking around and changing subjects. However, the judge would forgive its mistakes because Eugene identified as a teenager who spoke English as his second language.

Weaknesses of the Turing test

The performance of Eugene Goostman exhibited some of the Turing test's problems. Levesque identifies several major issues, summarized as follows:

Deception: The machine is forced to construct a false identity, which is not part of intelligence.
Conversation: A lot of interaction may qualify as "legitimate conversation"—jokes, clever asides, points of order—without requiring intelligent reasoning.
Evaluation: Humans make mistakes and judges often would disagree on the results.
Winograd schemas

The key factor in the WSC is the special format of its questions, which are derived from Winograd Schemas. Questions of this form may be tailored to require knowledge and commonsense reasoning in a variety of domains. They must also be carefully written not to betray their answers by selectional restrictions or statistical information about the words in the sentence.

Origin

The first cited example of a Winograd Schema is due to Terry Winograd:
The choices of "feared" and "advocated" turn the schema into its two instances:
The question is whether the pronoun "they" refers to the city councilmen or the demonstrators, and switching between the two instances of the schema changes the answer. The answer is immediate for a human reader, but proves difficult to emulate in machines. Levesque argues that knowledge plays a central role in these problems: the answer to this schema has to do with our understanding of the typical relationships between and behavior of councilmen and demonstrators.
Since the original proposition of the Winograd Schema Challenge, Ernest Davis, a professor at New York University, has compiled a list of over 140 Winograd Schemas from various sources as examples of the kinds of questions that should appear on the Winograd Schema Challenge.

Formal description

A Winograd Schema Challenge question consists of three parts:

A sentence or brief discourse that contains the following:
* Two noun phrases of the same semantic class,
* An ambiguous pronoun that may refer to either of the above noun phrases, and
* A special word and alternate word, such that if the special word is replaced with the alternate word, the natural resolution of the pronoun changes.
A question asking the identity of the ambiguous pronoun, and
Two answer choices corresponding to the noun phrases in question.

A machine will be given the problem in a standardized form which includes the answer choices, thus making it a binary decision problem.

Advantages

The Winograd Schema Challenge has the following purported advantages:

Knowledge and commonsense reasoning are required to solve them.
Winograd Schemas of varying difficulty may be designed, involving anything from simple cause-and-effect relationships to complex narratives of events.
They may be constructed to test reasoning ability in specific domains.
There is no need for human judges.
Pitfalls

One difficulty with the Winograd Schema Challenge is the development of the questions. They need to be carefully tailored to ensure that they require commonsense reasoning to solve. For example, Levesque gives the following example of a so-called Winograd Schema that is "too easy":
The answer to this question can be determined on the basis of selectional restrictions: in any situation, pills do not get pregnant, women do; women cannot be carcinogenic, but pills can. Thus this answer could be derived without the use of reasoning, or any understanding of the sentences' meaning—all that is necessary is data on the selectional restrictions of pregnant and carcinogenic.

Activity

In 2016 and 2018, Nuance Communications sponsored a competition, offering a grand prize of $25,000 for the top scorer above 90%. However, the 2018 competition was cancelled and the prize is no longer offered.
The Twelfth International Symposium on the Logical Formalizations of Commonsense Reasoning was held on March 23–25, 2015 at the AAAI Spring Symposium Series at Stanford University, with a special focus on the Winograd Schema Challenge. The organizing committee included Leora Morgenstern, Theodore Patkos, and Robert Sloan.
The 2016 Winograd Schema Challenge was run on July 11, 2016 at IJCAI-16. There were four contestants. The first round of the contest was to solve PDPs—pronoun disambiguation problems, adapted from literary sources, not constructed as pairs of sentences. The highest score achieved was 58% correct, by Quan Liu et al, of the University of Science and Technology, China. Hence, by the rules of that challenge, no prizes were awarded, and the challenge did not proceed to the second round. The organizing committee in 2016 was Leora Morgenstern, Ernest Davis, and Charles Ortiz.
70-percent accuracy was achieved by 2017. The top performance by 2019, with scores over 90%, was attained by the "gimmick" of adding appropriate WSC-like training data to the BERT language model rather than attempting to implement commensense reasoning. That performance was almost met by general language model GPT-3 without finetuning in 2020.
A version of the Winograd Schema Challenge is one part of the GLUE benchmark collection of challenges in automated natural language understanding.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...