Set (abstract data type)

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.
Some set data structures are designed for static or frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called dynamic or mutable sets, allow also the insertion and deletion of elements from the set.
A multiset is a special kind of set in which an element can figure several times.

Type theory

In type theory, sets are generally identified with their indicator function : accordingly, a set of values of type may be denoted by or. The characteristic function of a set is defined as:
In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additional axioms imposed on the standard operations. For example, an abstract heap can be viewed as a set structure with a min operation that returns the element of smallest value.

Operations

Core set-theoretical operations

One may define the operations of the algebra of sets:

union: returns the union of sets S and T.
intersection: returns the intersection of sets S and T.
difference: returns the difference of sets S and T.
subset: a predicate that tests whether the set S is a subset of set T.
Static sets

Typical operations that may be provided by a static set structure S are:

is_element_of: checks whether the value x is in the set S.
is_empty: checks whether the set S is empty.
size or cardinality: returns the number of elements in S.
iterate: returns a function that returns one more value of S at each call, in some arbitrary order.
enumerate: returns a list containing the elements of S in some arbitrary order.
build: creates a set structure with values x₁,x₂,...,x_n.
create_from: creates a new set structure containing all the elements of the given collection or all the elements returned by the given iterator.
Dynamic sets

Dynamic set structures typically add:

create: creates a new, initially empty set structure.
* create_with_capacity: creates a new set structure, initially empty but capable of holding up to n elements.
add: adds the element x to S, if it is not present already.
remove: removes the element x from S, if it is present.
capacity: returns the maximum number of values that S can hold.

Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.

Additional operations

There are many other operations that can be defined in terms of the above, such as:

pop: returns an arbitrary element of S, deleting it from S.
pick: returns an arbitrary element of S. Functionally, the mutator pop can be interpreted as the pair of selectors , where rest returns the set consisting of all elements except for the arbitrary element. Can be interpreted in terms of iterate.
map: returns the set of distinct values resulting from applying function F to each element of S.
filter: returns the subset containing all elements of S that satisfy a given predicate P.
fold: returns the value A_|S| after applying A_i+1 := F for each element e of S, for some binary operation F. F must be associative and commutative for this to be well-defined.
clear: delete all elements of S.
equal: checks whether the two given sets are equal.
hash: returns a hash value for the static set S such that if equal then hash = hash

Other operations can be defined for sets with elements of a special type:

sum: returns the sum of all elements of S for some definition of "sum". For example, over integers or reals, it may be defined as fold.
collapse: given a set of sets, return the union. For example, collapse . May be considered a kind of sum.
flatten: given a set consisting of sets and atomic elements, returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – like collapse, but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements. For example, flatten .
nearest: returns the element of S that is closest in value to x.
min, max: returns the minimum/maximum element of S.
Implementations

Sets can be implemented using various data structures, which provide different time and space trade-offs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such as nearest or union. Implementations described as "general use" typically strive to optimize the element_of, add, and delete operations. A simple implementation is to use a list, ignoring the order of the elements and taking care to avoid repeated values. This is simple but inefficient, as operations like set membership or element deletion are O, as they require scanning the entire list. Sets are often instead implemented using more efficient data structures, particularly various flavors of trees, tries, or hash tables.
As sets can be interpreted as a kind of map, sets are commonly implemented in the same way as maps – in this case in which the value of each key-value pair has the unit type or a sentinel value – namely, a self-balancing binary search tree for sorted sets, or a hash table for unsorted sets average-case, but O. A sorted linear hash table may be used to provide deterministically ordered sets.
Further, in languages that support maps but not sets, sets can be implemented in terms of maps. For example, a common programming idiom in Perl that converts an array to a hash whose values are the sentinel value 1, for use as a set, is:

my %elements = map @elements;

Other popular methods include arrays. In particular a subset of the integers 1..n can be implemented efficiently as an n-bit bit array, which also support very efficient union and intersection operations. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.
The Boolean set operations can be implemented in terms of more elementary operations, but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm for union will take time proportional to the length m of S times the length n of T; whereas a variant of the list merging algorithm will do the job in time proportional to m+n. Moreover, there are specialized set data structures that are optimized for one or more of these operations, at the expense of others.

Language support

One of the earliest languages to support sets was Pascal; many languages now include it, whether in the core language or in a standard library.

In C++, the Standard Template Library provides the set template class, which is typically implemented using a binary search tree ; SGI's STL also provides the hash_set template class, which implements a set using a hash table. C++11 has support for the unordered_set template class, which is implemented using a hash table. In sets, the elements themselves are the keys, in contrast to sequenced containers, where elements are accessed using their position. Set elements must have a strict weak ordering.
Java offers the interface to support sets, and the sub-interface to support sorted sets.
Apple's Foundation framework provides the Objective-C classes , , , , and . The CoreFoundation APIs provide the and types for use in C.
Python has built-in since 2.4, and since Python 3.0 and 2.7, supports non-empty set literals using a curly-bracket syntax, e.g.: ; empty sets must be created using set, because Python uses to represent the empty dictionary.
The.NET Framework provides the generic and classes that implement the generic interface.
Smalltalk's class library includes Set and IdentitySet, using equality and identity for inclusion test respectively. Many dialects provide variations for compressed storage, for ordering or for weak references.
Ruby's standard library includes a module which contains Set and SortedSet classes that implement sets using hash tables, the latter allowing iteration in sorted order.
OCaml's standard library contains a Set module, which implements a functional set data structure using binary search trees.
The GHC implementation of Haskell provides a module, which implements immutable sets using binary search trees.
The Tcl Tcllib package provides a set module which implements a set data structure based upon TCL lists.
The Swift standard library contains a Set type, since Swift 1.2.
JavaScript introduced as a standard built-in object with the ECMAScript 2015 standard.
Erlang's standard library has a module.
Clojure has literal syntax for hashed sets, and also implements sorted sets.
LabVIEW has native support for sets, from version 2019.

As noted in the previous section, in languages which do not directly support sets but do support associative arrays, sets can be emulated using associative arrays, by using the elements as keys, and using a dummy value as the values, which are ignored.

Multiset

A generalization of the notion of a set is that of a multiset or bag, which is similar to a set but allows repeated values. This is used in two distinct senses: either equal values are considered identical, and are simply counted, or equal values are considered equivalent, and are stored as distinct items. For example, given a list of people and ages, one could construct a multiset of ages, which simply counts the number of people of a given age. Alternatively, one can construct a multiset of people, where two people are considered equivalent if their ages are the same, in which case each pair must be stored, and selecting on a given age gives all the people of a given age.
Formally, it is possible for objects in computer science to be considered "equal" under some equivalence relation but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version and keep a positive integer count of the multiplicity of the element.
As with sets, multisets can naturally be implemented using hash table or trees, which yield different performance characteristics.
The set of all bags over type T is given by the expression bag T. If by multiset one considers equal items identical and simply counts them, then a multiset can be interpreted as a function from the input domain to the non-negative integers, generalizing the identification of a set with its indicator function. In some cases a multiset in this counting sense may be generalized to allow negative values, as in Python.

C++'s Standard Template Library implements both sorted and unsorted multisets. It provides the multiset class for the sorted multiset, as a kind of associative container, which implements this multiset using a self-balancing binary search tree. It provides the unordered_multiset class for the unsorted multiset, as a kind of unordered associative containers, which implements this multiset using a hash table. The unsorted multiset is standard as of C++11; previously SGI's STL provides the hash_multiset class, which was copied and eventually standardized.
For Java, third-party libraries provide multiset functionality:
* Apache Commons Collections provides the and SortedBag interfaces, with implementing classes like HashBag and TreeBag.
* Google Guava provides the interface, with implementing classes like and .
Apple provides the class as part of Cocoa, and the and types as part of CoreFoundation.
Python's standard library includes , which is similar to a multiset.
Smalltalk includes the Bag class, which can be instantiated to use either identity or equality as predicate for inclusion test.

Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects or use an associative array mapping the values to their integer multiplicities.
Typical operations on bags:

contains: checks whether the element x is present in the bag B
is_sub_bag: checks whether each element in the bag B₁ occurs in B₁ no more often than it occurs in the bag B₂; sometimes denoted as B₁ ⊑ B₂.
count: returns the number of times that the element x occurs in the bag B; sometimes denoted as B # x.
scaled_by: given a natural number n, returns a bag which contains the same elements as the bag B, except that every element that occurs m times in B occurs n * m times in the resulting bag; sometimes denoted as n ⊗ B.
union: returns a bag that containing just those values that occur in either the bag B₁ or the bag B₂, except that the number of times a value x occurs in the resulting bag is equal to + ; sometimes denoted as B₁ ⊎ B₂.
Multisets in SQL

In relational databases, a table can be a set or a multiset, depending on the presence of unicity constraints on some columns.
SQL allows the selection of rows from a relational table: this operation will in general yield a multiset, unless the keyword DISTINCT is used to force the rows to be all different, or the selection includes the primary key.
In ANSI SQL the MULTISET keyword can be used to transform a subquery into a collection expression:
SELECT expression1, expression2... FROM table_name...
is a general select that can be used as subquery expression of another more general query, while
MULTISET
transforms the subquery into a collection expression that can be used in another query, or in assignment to a column of appropriate collection type.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Set (abstract data type)

Type theory

Operations

Core set-theoretical operations

Static sets

Dynamic sets

Additional operations

Implementations

Language support

Multiset

Multisets in SQL