Rabin fingerprint


The Rabin fingerprinting scheme is a method for implementing fingerprints using polynomials over a finite field. It was proposed by Michael O. Rabin.

Scheme

Given an n-bit message m0,...,mn-1, we view it as a polynomial of degree n-1 over the finite field GF.
We then pick a random irreducible polynomial of degree k over GF, and we define the fingerprint of the message m to be the remainder after division of by over GF which can be viewed as a polynomial of degree k-1 or as a k-bit number.

Applications

Many implementations of the Rabin–Karp algorithm internally use Rabin fingerprints.
The Low Bandwidth Network Filesystem from MIT uses Rabin fingerprints to implement variable size shift-resistant blocks.
The basic idea is that the filesystem computes the cryptographic hash of each block in a file. To save on transfers between the client and server,
they compare their checksums and only transfer blocks whose checksums differ. But one problem with this scheme is that a single insertion at the beginning of the file will cause every checksum to change if fixed-sized blocks are used. So the idea is to select blocks not based on a specific offset but rather by some property of the block contents. LBFS does this by sliding a 48 byte window over the file and computing the Rabin fingerprint of each window. When the low 13 bits of the fingerprint are zero LBFS calls those 48 bytes a breakpoint and ends the current block and begins a new one. Since the output of Rabin fingerprints are pseudo-random the probability of any given 48 bytes being a breakpoint is . This has the effect of shift-resistant variable size blocks. Any hash function could be used to divide a long file into blocks : but the Rabin fingerprint is an efficient rolling hash, since the computation of the Rabin fingerprint of region B can reuse some of the computation of the Rabin fingerprint of region A when regions A and B overlap.
Note that this is a problem similar to that faced by rsync.