ABX test

An ABX test is a method of comparing two choices of sensory stimuli to identify detectable differences between them. A subject is presented with two known samples followed by one unknown sample that is randomly selected from either A or B. The subject is then required to identify X as either A or B. If X cannot be identified reliably with a low p-value in a predetermined number of trials, then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between A and B.
ABX tests can easily be performed as double-blind trials, eliminating any possible unconscious influence from the researcher or the test supervisor. Because samples A and B are provided just prior to sample X, the difference does not have to be discerned from assumption based on long-term memory or past experience. Thus, the ABX test answers whether or not, under ideal circumstances, a perceptual difference can be found.
ABX tests are commonly used in evaluations of digital audio data compression methods; sample A is typically an uncompressed sample, and sample B is a compressed version of A. Audible compression artifacts that indicate a shortcoming in the compression algorithm can be identified with subsequent testing. ABX tests can also be used to compare the different degrees of fidelity loss between two different audio formats at a given bitrate.
ABX tests can be used to audition input, processing, and output components as well as cabling: virtually any audio product or prototype design.

History

The history of ABX testing and naming dates back to 1950 in a paper published by two Bell Labs researchers, W. A. Munson and Mark B. Gardner, titled Standardizing Auditory Tests.

The purpose of the present paper is to describe a test procedure which has shown promise in this direction and to give descriptions of equipment which have been found helpful in minimizing the variability of the test results. The procedure, which we have called the “ABX” test, is a modification of the method of paired comparisons. An observer is presented with a time sequence of three signals for each judgment he is asked to make. During the first time interval he hears signal A, during the second, signal B, and finally signal X. His task is to indicate whether the sound heard during the X interval was more like that during the A interval or more like that during the B interval. For a threshold test, the A interval is quiet, the B interval is signal, and the X interval is either quiet or signal.

The test has evolved to other variations such as subject control over duration and sequence of testing. One such example was the hardware ABX comparator in 1977, built by the ABX company in Troy, Michigan, and documented by one of its founders, David Clark.

Refinements to the A/B test
The author's first experience with double-blind audibility testing was as a member of the SMWTMS Audio Club in early 1977. A button was provided which would select at random component A or B. Identifying one of these, the X component was greatly hampered by not having the known A and B available for reference.
This was corrected by using three interlocked pushbuttons, A, B, and X. Once an X was selected, it would remain that particular A or B until it was decided to move on to another random selection.
However, another problem quickly became obvious. There was always an audible relay transition time delay when switching from A to B. When switching from A to X, however, the time delay would be missing if X was really A and present if X was really B. This extraneous cue was removed by inserting a fixed length dropout time when any change was made. The dropout time was selected to be 50 ms which produces a slight consistent click while allowing subjectively instant comparison.

The ABX company is now defunct and hardware comparators in general as commercial offerings extinct. Myriad of software tools exist such as Foobar ABX plug-in for performing file comparisons. But hardware equipment testing requires building custom implementations.

Hardware tests

ABX test equipment utilizing relays to switch between two different hardware paths can help determine if there are perceptual differences in cables and components. Video, audio and digital transmission paths can be compared. If the switching is microprocessor controlled, double-blind tests are possible.
Loudspeaker level and line level audio comparisons could be performed on an ABX test device offered for sale as the ABX Comparator by QSC Audio Products from 1998 to 2004. Other hardware solutions have been fabricated privately by individuals or organizations for internal testing.

Confidence

If only one ABX trial were performed, random guessing would incur a 50% chance of choosing the correct answer, the same as flipping a coin. In order to make a statement having some degree of confidence, many trials must be performed. By increasing the number of trials, the likelihood of statistically asserting a person's ability to distinguish A and B is enhanced for a given confidence level. A 95% confidence level is commonly considered statistically significant. The company QSC, in the ABX Comparator user manual, recommended a minimum of ten listening trials in each round of tests.

Number of trials	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25
Minimum number correct	9	9	10	10	11	12	12	13	13	14	15	15	16	16	17	18

QSC recommended that no more than 25 trials be performed, as subject fatigue can set in, making the test less sensitive. However, a more sensitive test can be obtained by pooling the results from a number of such tests using separate individuals or tests from the same subject conducted in between rest breaks. For a large number of total trials N, a significant result can be claimed if the number of correct responses exceeds. Important decisions are normally based on a higher level of confidence, since an erroneous "significant result" would be claimed in one of 20 such tests simply by chance.

Software tests

The foobar2000 and the Amarok audio players support software-based ABX testing, the latter using a third-party script. Lacinato ABX is a cross-platform audio testing tool for Linux, Windows, and 64-bit Mac. Lacinato WebABX is a web-based cross-browser audio ABX tool. Open source aveX was mainly developed for Linux which also provides test-monitoring from a remote computer. ABX patcher is an ABX implementation for Max/MSP. More ABX software can be found at the archived PCABX website.

Codec listening tests

A codec listening test is a scientific study designed to compare two or more lossy audio codecs, usually with respect to perceived fidelity or compression efficiency.

Potential flaws

ABX is a type of forced choice testing. A subject's choices can be on merit, i.e. the subject indeed honestly tried to identify whether X seemed closer to A or B. But uninterested or tired subjects might choose randomly without even trying. If not caught, this may dilute the results of other subjects who intently took the test and subject the outcome to Simpson's paradox, resulting in false summary results. Simply looking at the outcome totals of the test cannot reveal occurrences of this problem.
This problem becomes more acute if the differences are small. The user may get frustrated and simply aim to finish the test by voting randomly. In this regard, forced choice tests such as ABX tend to favor negative outcomes when differences are small if proper protocols are not used to guard against this problem.
Best practices call for both the inclusion of controls and the screening of subjects:

A major consideration is the inclusion of appropriate control conditions. Typically, control conditions include the presentation of unimpaired audio materials, introduced in ways that are unpredictable to the subjects. It is the differences between judgement of these control stimuli and the potentially impaired ones that allows one to conclude that the grades are actual assessments of the impairments.

3.2.2 Post-screening of subjects
Post-screening methods can be roughly separated into at least two classes; one is based on inconsistencies compared with the mean result and another relies on the ability of the subject to make correct identifications. The first class is never justifiable. Whenever a subjective listening test is performed with the test method recommended here, the required information for the second class of post-screening is automatically available. A suggested statistical method for doing this is described in Attachment 1.'
The methods are primarily used to eliminate subjects who cannot make the appropriate discriminations. The application of a post-screening method may clarify the tendencies in a test result. However, bearing in mind the variability of subjects’ sensitivities to different artefacts, caution should be exercised.

Other flaws include lack of subject training and familiarization with the test and content selected:

4.1 Familiarization or training phase
Prior to formal grading, subjects must be allowed to become thoroughly familiar with the test facilities, the test environment, the grading process, the grading scales and the methods of their use. Subjects should also become thoroughly familiar with the artefacts under study. For the most sensitive tests they should be exposed to all the material they will be grading later in the formal grading sessions. During familiarization or training, subjects should be preferably together in groups, so that they can interact freely and discuss the artefacts they detect with each other.

Other problems might arise from the ABX equipment itself, as outlined by Clark, where the equipment provides a tell, allowing the subject to identify the source. Lack of transparency of the ABX fixture creates similar problems.
Since auditory tests and many other sensory tests rely on short-term memory, which only lasts a few seconds, it is critical that the test fixture allows the subject to identify short segments that can be compared quickly. Pops and glitches in switching apparatus likewise must be eliminated, as they may dominate or otherwise interfere with the stimuli being tested in what is stored in the subject's short-term memory.

Alternatives

Algorithmic Audio Compression Evaluation

Since ABX testing requires human beings for evaluation of lossy audio codecs, it is time-consuming and costly. Therefore, cheaper approaches have been developed, e.g. PEAQ, which is an implementation of the ODG.

MUSHRA

In MUSHRA, the subject is presented with the reference, a certain number of test samples, a hidden version of the reference and one or more anchors. A 0-100 RATING scale makes it possible to rate very small differences.

Discrimination testing

Alternative general methods are used in discrimination testing, such as paired comparison, duo–trio, and triangle testing. Of these, duo–trio and triangle testing are particularly close to ABX testing. Schematically:
;Duo–trio: AXY – one known, two unknown, test is which unknown is the known: X = A, or Y = A.
;Triangle: XXY – three unknowns, test which is the odd one out: Y = 1, Y = 2, or Y = 3.
In this context, ABX testing is also known as "duo–trio" in "balanced reference" mode – both knowns are presented as references, rather than one alone.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...