Zipper (data structure)
A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997. It includes and generalizes the gap buffer technique sometimes used with arrays.
The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures.
Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation.
A layman's explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent, and the possibility to go downwards. The zipper is the pointer to the current path. Behind the scenes the zippers are efficient when making changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation.
Example: Bidirectional list traversal
Many common data structures in computer science can be expressed as the structure generated by a few primitive constructor operations or observer operations. These include the structure of finite lists, which can be generated by two operations:-
Empty
constructs an empty list, -
Cons
constructs a list by prepending or concatenating valuex
in front of listL
.
is therefore the declaration Cons
. It is possible to describe the location in such a list as the number of steps from the front of the list to the target location. More formally, a location in the list is the number of Cons
operations required to reconstruct the whole list from that particular location. For example, in Cons)
a Cons
and a Cons
operation would be required to reconstruct the list relative to position X otherwise known as Cons
. This recording together with the location is called a zipped representation of the list or a list-zipper.To be clear, a location in the list is not just the number of
Cons
operations, but also all of the other information about those Cons
; in this case, the values that must be reconnected. Here, these values may be conveniently represented in a separate list in the order of application from the target location. Specifically, from the context of "3" in the list
, a recording could be represented as
where Cons
is applied followed by
to reconstitute the original list starting from
.A list-zipper always represents the entire data structure. However, this information is from the perspective of a specific location within that data structure. Consequently, a list-zipper is a pair consisting of both the location as a context or starting point, and a recording or path that permits reconstruction from that starting location. In particular, the list-zipper of
at the location of "3" may be represented as
. Now, if "3" is changed to "10", then the list-zipper becomes
. The list may then be efficiently reconstructed:
or other locations traversed to. With the list represented this way, it is easy to define relatively efficient operations on immutable data structures such as Lists and Trees at arbitrary locations. In particular, applying the zipper transform to a tree makes it is easy to insert or remove values at any particular location in the tree.
Contexts and differentiation
The type of a zipper's contexts can be constructed via a transformation of the original type that is closely related to the derivative of calculus through decategorification. The recursive types that zippers are formed from can be viewed as the least fixed point of a unary type constructor of kind. For example, with a higher-order type constructor that constructs the least fixed point of its argument, an unlabeled binary tree can be represented as and an unlabeled list may take the form. Here, the notation of exponentiation, multiplication, and addition correspond to function types, product types, and sum types respectively, whilst the natural numbers label the finite types; in this way, the type constructors resemble polynomial functions in.The derivative of a type constructor can therefore be formed through this syntactic analogy: for the example of an unlabeled ternary tree, the derivative of its type constructor would be equivalent to, analogously to the use of the sum and power rules in differential calculus. The type of the contexts of a zipper over an original type is equivalent to the derivative of the type constructor applied to the original type,.
For illustration, consider the recursive data structure of a binary tree with nodes that are either sentinel nodes of type or contain a value of type :
The derivative of the type constructor can be computed to be
Thus, the type of the zipper's contexts is
As such, we find that the context of each non-sentinel child node in the labelled binary tree is a triple consisting of
- a boolean value of type, expressing whether the current node is the left or right child of its parent node;
- a value of type, the label of the current node's parent; and
- the node's sibling of type, the subtree contained by the other branch of the current node's parent.
Uses
The zipper is often used where there is some concept of focus or of moving around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner.The zipper has been used in
- Xmonad, to manage focus and placement of windows
- Huet's papers cover a structural editor based on zippers and a theorem prover
- A filesystem written in Haskell offering "...transactional semantics; undo of any file and directory operation; snapshots; statically guaranteed the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility; and just the right behavior for cyclic directory references."
- Clojure has extensive support for zippers.
Alternatives and extensions
Direct modification
In a non-purely-functional programming language, it may be more convenient to simply traverse the original data structure and modify it directly.Generic zipper
The Generic Zipper is a technique to achieve the same goal as the conventional zipper by capturing the state of the traversal in a continuation while visiting each node.However, the Generic Zipper involves inversion of control, so some uses of it require a state machine to keep track of what to do next.