Heapsort

In computer science, heapsort is a comparison-based sorting algorithm. Heapsort can be thought of as an improved selection sort: like selection sort, heapsort divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element from it and inserting it into the sorted region. Unlike selection sort, heapsort does not waste time with a linear-time scan of the unsorted region; rather, heap sort maintains the unsorted region in a heap data structure to more quickly find the largest element in each step.
Although somewhat slower in practice on most machines than a well-implemented quicksort, it has the advantage of a more favorable worst-case big O notation| runtime. Heapsort is an in-place algorithm, but it is not a stable sort.
Heapsort was invented by J. W. J. Williams in 1964. This was also the birth of the heap, presented already by Williams as a useful data structure in its own right. In the same year, R. W. Floyd published an improved version that could sort an array in-place, continuing his earlier research into the treesort algorithm.

Overview

The heapsort algorithm can be divided into two parts.
In the first step, a heap is built out of the data. The heap is often placed in an array with the layout of a complete binary tree. The complete binary tree maps the binary tree structure into the array indices; each array index represents a node; the index of the node's parent, left child branch, or right child branch are simple expressions. For a zero-based array, the root node is stored at index 0; if i is the index of the current node, then


 iParent = floor where floor functions map a real number to the smallest leading integer.
 iLeftChild = 2*i + 1
 iRightChild = 2*i + 2

In the second step, a sorted array is created by repeatedly removing the largest element from the heap, and inserting it into the array. The heap is updated after each removal to maintain the heap property. Once all objects have been removed from the heap, the result is a sorted array.
Heapsort can be performed in place. The array can be split into two parts, the sorted array and the heap. The storage of heaps as arrays is diagrammed here. The heap's invariant is preserved after each extraction, so the only cost is that of extraction.

Algorithm

The Heapsort algorithm involves preparing the list by first turning it into a max heap. The algorithm then repeatedly swaps the first value of the list with the last value, decreasing the range of values considered in the heap operation by one, and sifting the new first value into its position in the heap. This repeats until the range of considered values is one value in length.
The steps are:

Call the buildMaxHeap function on the list. Also referred to as heapify, this builds a heap from a list in O operations.
Swap the first element of the list with the final element. Decrease the considered range of the list by one.
Call the siftDown function on the list to sift the new first element to its appropriate index in the heap.
Go to step unless the considered range of the list is one element.

The buildMaxHeap operation is run once, and is in performance. The siftDown function is, and is called times. Therefore, the performance of this algorithm is.

Pseudocode

The following is a simple way to implement the algorithm in pseudocode. Arrays are zero-based and swap is used to exchange two elements of the array. Movement 'down' means from the root towards the leaves, or from lower indices to higher. Note that during the sort, the largest element is at the root of the heap at a, while at the end of the sort, the largest element is in a.
procedure heapsort is
input: an unordered array a of length count

'
heapify
'
end ← count - 1
while end > 0 do
'
swap
'
end ← end - 1
'
siftDown
The sorting routine uses two subroutines, heapify and siftDown. The former is the common in-place heap construction routine, while the latter is a common subroutine for implementing heapify.
'
procedure heapify is
'
'
start ← iParent

while start ≥ 0 do
'
siftDown
'
start ← start - 1
'
'
procedure siftDown is
root ← start
while iLeftChild ≤ end do '
child ← iLeftChild '
swap ← root '
if a < a then
swap ← child
'
if child+1 ≤ end and a < a then
swap ← child + 1
if swap = root then
'
return
else
swap
root ← swap '
The heapify procedure can be thought of as building a heap from the bottom up by successively sifting downward to establish the heap property. An alternative version that builds the heap top-down and sifts upward may be simpler to understand. This siftUp version can be visualized as starting with an empty heap and successively inserting elements, whereas the siftDown version given above treats the entire input array as a full but "broken" heap and "repairs" it starting from the last non-trivial sub-heap.
Also, the siftDown version of heapify has time complexity, while the siftUp version given below has time complexity due to its equivalence with inserting each element, one at a time, into an empty heap.
This may seem counter-intuitive since, at a glance, it is apparent that the former only makes half as many calls to its logarithmic-time sifting function as the latter; i.e., they seem to differ only by a constant factor, which never affects asymptotic analysis.
To grasp the intuition behind this difference in complexity, note that the number of swaps that may occur during any one siftUp call increases with the depth of the node on which the call is made. The crux is that there are many more "deep" nodes than there are "shallow" nodes in a heap, so that siftUp may have its full logarithmic running-time on the approximately linear number of calls made on the nodes at or near the "bottom" of the heap. On the other hand, the number of swaps that may occur during any one siftDown call decreases as the depth of the node on which the call is made increases. Thus, when the siftDown heapify begins and is calling siftDown on the bottom and most numerous node-layers, each sifting call will incur, at most, a number of swaps equal to the "height" of the node on which the sifting call is made. In other words, about half the calls to siftDown will have at most only one swap, then about a quarter of the calls will have at most two swaps, etc.
The heapsort algorithm itself has time complexity using either version of heapify.
procedure heapify is
'
end := 1

while end < count
'
siftUp
end := end + 1
'

procedure siftUp is
input: start represents the limit of how far up the heap to sift.
end is the node to sift up.
child := end
while child > start
parent := iParent
if a < a then '
swap
child := parent '
else
return'''

Variations

Floyd's heap construction

The most important variation to the basic algorithm, which is included in all practical implementations, is a heap-construction algorithm by Floyd which runs in time and uses siftdown rather than siftup, avoiding the need to implement siftup at all.
Rather than starting with a trivial heap and repeatedly adding leaves, Floyd's algorithm starts with the leaves, observing that they are trivial but valid heaps by themselves, and then adds parents. Starting with element and working backwards, each internal node is made the root of a valid heap by sifting down. The last step is sifting down the first element, after which the entire array obeys the heap property.
The worst-case number of comparisons during the Floyd's heap-construction phase of Heapsort is known to be equal to, where is the number of 1 bits in the binary representation of and is number of trailing 0 bits.
The standard implementation of Floyd's heap-construction algorithm causes a large number of cache misses once the size of the data exceeds that of the CPU cache. Much better performance on large data sets can be obtained by merging in depth-first order, combining subheaps as soon as possible, rather than combining all subheaps on one level before proceeding to the one above.

Bottom-up heapsort

Bottom-up heapsort is a variant which reduces the number of comparisons required by a significant factor. While ordinary heapsort requires comparisons worst-case and on average, the bottom-up variant requires comparisons on average, and in the worst case.
If comparisons are cheap then the difference is unimportant, as top-down heapsort compares values that have already been loaded from memory. If, however, comparisons require a function call or other complex logic, then bottom-up heapsort is advantageous.
This is accomplished by improving the siftDown procedure. The change improves the linear-time heap-building phase somewhat, but is more significant in the second phase. Like ordinary heapsort, each iteration of the second phase extracts the top of the heap,, and fills the gap it leaves with, then sifts this latter element down the heap. But this element comes from the lowest level of the heap, meaning it is one of the smallest elements in the heap, so the sift-down will likely take many steps to move it back down. In ordinary heapsort, each step of the sift-down requires two comparisons, to find the minimum of three elements: the new node and its two children.
Bottom-up heapsort instead finds the path of largest children to the leaf level of the tree using only one comparison per level. Put another way, it finds a leaf which has the property that it and all of its ancestors are greater than or equal to their siblings. Then, from this leaf, it searches upward for the correct position in that path to insert. This is the same location as ordinary heapsort finds, and requires the same number of exchanges to perform the insert, but fewer comparisons are required to find that location.
Because it goes all the way to the bottom and then comes back up, it is called heapsort with bounce by some authors.
function leafSearch is
j ← i
while iRightChild ≤ end do
'
if a > a then
j ← iRightChild
else
j ← iLeftChild
'
if iLeftChild ≤ end then
j ← iLeftChild
return j
The return value of the leafSearch is used in the modified siftDown routine:
procedure siftDown is
j ← leafSearch
while a > a do
j ← iParent
x ← a
a ← a
while j > i do
swap x, a
j ← iParent
Bottom-up heapsort was announced as beating quicksort on arrays of size ≥16000.
A 2008 re-evaluation of this algorithm showed it to be no faster than ordinary heapsort for integer keys, presumably because modern branch prediction nullifies the cost of the predictable comparisons which bottom-up heapsort manages to avoid.
A further refinement does a binary search in the path to the selected leaf, and sorts in a worst case of comparisons, approaching the information-theoretic lower bound of comparisons.
A variant which uses two extra bits per internal node to cache information about which child is greater uses less than compares.

Other variations

Ternary heapsort uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program, but does a constant number of times fewer swap and comparison operations. This is because each sift-down step in a ternary heap requires three comparisons and one swap, whereas in a binary heap two comparisons and one swap are required. Two levels in a ternary heap cover 3² = 9 elements, doing more work with the same number of comparisons as three levels in the binary heap, which only cover 2³ = 8. This is primarily of academic interest, as the additional complexity is not worth the minor savings, and bottom-up heapsort beats both.
The smoothsort algorithm is a variation of heapsort developed by Edsger Dijkstra in 1981. Like heapsort, smoothsort's upper bound is Big O notation|. The advantage of smoothsort is that it comes closer to time if the input is already sorted to some degree, whereas heapsort averages regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.
Levcopoulos and Petersson describe a variation of heapsort based on a heap of Cartesian trees. First, a Cartesian tree is built from the input in time, and its root is placed in a 1-element binary heap. Then we repeatedly extract the minimum from the binary heap, output the tree's root element, and add its left and right children which are themselves Cartesian trees, to the binary heap. As they show, if the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than for inputs that are already nearly sorted.
Several variants such as weak heapsort require comparisons in the worst case, close to the theoretical minimum, using one extra bit of state per node. While this extra bit makes the algorithms not truly in-place, if space for it can be found inside the element, these algorithms are simple and efficient, but still slower than binary heaps if key comparisons are cheap enough that a constant factor does not matter.
Katajainen's "ultimate heapsort" requires no extra storage, performs comparisons, and a similar number of element moves. It is, however, even more complex and not justified unless comparisons are very expensive.
Comparison with other sorts

Heapsort primarily competes with quicksort, another very efficient general purpose nearly-in-place comparison-based sort algorithm.
Quicksort is typically somewhat faster due to some factors, but the worst-case running time for quicksort is, which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. See quicksort for a detailed discussion of this problem and possible solutions.
Thus, because of the upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort, such as the Linux kernel.
Heapsort also competes with merge sort, which has the same time bounds. Merge sort requires auxiliary space, but heapsort requires only a constant amount. Heapsort typically runs faster in practice on machines with small or slow data caches, and does not require as much external memory. On the other hand, merge sort has several advantages over heapsort:

Merge sort on arrays has considerably better data cache performance, often outperforming heapsort on modern desktop computers because merge sort frequently accesses contiguous memory locations ; heapsort references are spread throughout the heap.
Heapsort is not a stable sort; merge sort is stable.
Merge sort parallelizes well and can achieve close to linear speedup with a trivial implementation; heapsort is not an obvious candidate for a parallel algorithm.
Merge sort can be adapted to operate on singly linked lists with extra space. Heapsort can be adapted to operate on doubly linked lists with only extra space overhead.
Merge sort is used in external sorting; heapsort is not. Locality of reference is the issue.

Introsort is an alternative to heapsort that combines quicksort and heapsort to retain advantages of both: worst case speed of heapsort and average speed of quicksort.

Example

Let be the list that we want to sort from the smallest to the largest.

Heap	newly added element	swap elements
null	6
6	5
6, 5	3
6, 5, 3	1
6, 5, 3, 1	8
6, 5, 3, 1, 8		5, 8
6, 8, 3, 1, 5		6, 8
8, 6, 3, 1, 5	7
8, 6, 3, 1, 5, 7		3, 7
8, 6, 7, 1, 5, 3	2
8, 6, 7, 1, 5, 3, 2	4
8, 6, 7, 1, 5, 3, 2, 4		1, 4
8, 6, 7, 4, 5, 3, 2, 1

Heap	swap elements	delete element	sorted array	details
8, 6, 7, 4, 5, 3, 2, 1	8, 1			swap 8 and 1 in order to delete 8 from heap
1, 6, 7, 4, 5, 3, 2, 8		8		delete 8 from heap and add to sorted array
1, 6, 7, 4, 5, 3, 2	1, 7		8	swap 1 and 7 as they are not in order in the heap
7, 6, 1, 4, 5, 3, 2	1, 3		8	swap 1 and 3 as they are not in order in the heap
7, 6, 3, 4, 5, 1, 2	7, 2		8	swap 7 and 2 in order to delete 7 from heap
2, 6, 3, 4, 5, 1, 7		7	8	delete 7 from heap and add to sorted array
2, 6, 3, 4, 5, 1	2, 6		7, 8	swap 2 and 6 as they are not in order in the heap
6, 2, 3, 4, 5, 1	2, 5		7, 8	swap 2 and 5 as they are not in order in the heap
6, 5, 3, 4, 2, 1	6, 1		7, 8	swap 6 and 1 in order to delete 6 from heap
1, 5, 3, 4, 2, 6		6	7, 8	delete 6 from heap and add to sorted array
1, 5, 3, 4, 2	1, 5		6, 7, 8	swap 1 and 5 as they are not in order in the heap
5, 1, 3, 4, 2	1, 4		6, 7, 8	swap 1 and 4 as they are not in order in the heap
5, 4, 3, 1, 2	5, 2		6, 7, 8	swap 5 and 2 in order to delete 5 from heap
2, 4, 3, 1, 5		5	6, 7, 8	delete 5 from heap and add to sorted array
2, 4, 3, 1	2, 4		5, 6, 7, 8	swap 2 and 4 as they are not in order in the heap
4, 2, 3, 1	4, 1		5, 6, 7, 8	swap 4 and 1 in order to delete 4 from heap
1, 2, 3, 4		4	5, 6, 7, 8	delete 4 from heap and add to sorted array
1, 2, 3	1, 3		4, 5, 6, 7, 8	swap 1 and 3 as they are not in order in the heap
3, 2, 1	3, 1		4, 5, 6, 7, 8	swap 3 and 1 in order to delete 3 from heap
1, 2, 3		3	4, 5, 6, 7, 8	delete 3 from heap and add to sorted array
1, 2	1, 2		3, 4, 5, 6, 7, 8	swap 1 and 2 as they are not in order in the heap
2, 1	2, 1		3, 4, 5, 6, 7, 8	swap 2 and 1 in order to delete 2 from heap
1, 2		2	3, 4, 5, 6, 7, 8	delete 2 from heap and add to sorted array
1		1	2, 3, 4, 5, 6, 7, 8	delete 1 from heap and add to sorted array
			1, 2, 3, 4, 5, 6, 7, 8	completed

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...