Discrete Universal Denoiser


In information theory and signal processing, the Discrete Universal Denoiser is a denoising scheme for recovering sequences over a finite alphabet, which have been corrupted by a discrete
memoryless channel. The DUDE was proposed in 2005 by Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdú and Marcelo J. Weinberger.

Overview

The Discrete Universal Denoiser is a denoising scheme that estimates an
unknown signal over a finite
alphabet from a noisy version.
While most denoising schemes in the signal processing
and statistics literature deal with signals over
an infinite alphabet, the DUDE addresses the
finite alphabet case. The noisy version is assumed to be generated by transmitting
through a known discrete
memoryless channel.
For a fixed context length parameter, the DUDE counts of the occurrences of all the strings of length appearing in. The estimated value is determined based the two-sided length- context of, taking into account all the other tokens in with the same context, as well as the known channel matrix and the loss function being used.
The idea underlying the DUDE is best illustrated when is a
realization of a random vector. If the conditional distribution
, namely
the distribution of the noiseless symbol conditional on its noisy context was available, the optimal
estimator would be the Bayes Response to
Fortunately, when
the channel matrix is known and non-degenerate, this conditional distribution
can be expressed in terms of the conditional distribution
, namely
the distribution of the noisy symbol conditional on its noisy
context. This conditional distribution, in turn, can be estimated from an
individual observed noisy signal by virtue of the Law of Large Numbers,
provided is “large enough”.
Applying the DUDE scheme with a context length to a sequence of
length over a finite alphabet requires
operations and space.
Under certain assumptions, the DUDE is a universal scheme in the sense of asymptotically performing as well as an optimal denoiser, which has oracle access to the unknown sequence. More specifically, assume that the denoising performance is measured using a given single-character fidelity criterion, and consider the regime where the sequence length tends to infinity and the context length tends to infinity “not too fast”. In the stochastic setting, where a doubly infinite sequence noiseless sequence is a realization of a stationary process, the DUDE asymptotically performs, in expectation, as well as the best denoiser, which has oracle access to the source distribution. In the single-sequence, or “semi-stochastic” setting with a fixed doubly infinite sequence, the DUDE asymptotically performs as well as the best “sliding window” denoiser, namely any denoiser that determines from the window, which has oracle access to.

The discrete denoising problem

Let be the finite alphabet of a fixed but unknown original “noiseless” sequence. The sequence is fed into a discrete
memoryless channel. The DMC operates on each symbol independently, producing a corresponding random symbol in a finite alphabet. The DMC is known and given as a -by- Markov matrix, whose entries are. It is convenient to write for the -column of. The DMC produces a random noisy sequence. A specific realization of this random vector will be denoted by.
A denoiser is a function that attempts to recover the noiseless sequence from a distorted version. A specific denoised sequence is denoted by.
The problem of choosing the denoiser is known as signal
estimation, filtering or smoothing. To compare candidate denoisers, we choose a single-symbol fidelity criterion and define the per-symbol loss of the denoiser at by
Ordering the elements of the alphabet by, the fidelity criterion can be given by a -by- matrix, with columns of the form

The DUDE scheme

Step 1: Calculating the empirical distribution in each context

The DUDE corrects symbols according to their context. The context length used is a tuning parameter of the scheme. For, define the left context of the -th symbol in by and the corresponding right context as. A two-sided context is a combination of a left and a right context.
The first step of the DUDE scheme is to calculate the empirical distribution of symbols in each possible two-sided context along the noisy sequence. Formally, a given two-sided context that appears once or more along determines an empirical probability distribution over, whose value at the symbol is
Thus, the first step of the DUDE scheme with context length is to scan the input noisy sequence once, and store the length- empirical distribution vector for each two-sided context found along. Since there are at most possible two-sided contexts along, this step requires operations and storage.

Step 2: Calculating the Bayes response to each context

Denote the column of single-symbol fidelity criterion, corresponding to the symbol, by. We define the Bayes Response to any vector of length with non-negative entries as
This definition is motivated in the [|background] below.
The second step of the DUDE scheme is to calculate, for each two-sided context observed in the previous step along, and for each symbol observed in each context the Bayes response to the vector, namely
Note that the sequence and the context length are implicit. Here, is the -column of and for vectors and, denotes their Schur product, defined by. Matrix multiplication is evaluated before the Schur product, so that stands for.
This formula assumed that the channel matrix is square and invertible. When and is not invertible, under the reasonable assumption that it has full row rank, we replace above with its Moore-Penrose pseudo-inverse and calculate instead
By caching the inverse or pseudo-inverse, and the values for the relevant pairs, this step requires operations and storage.

Step 3: Estimating each symbol by the Bayes response to its context

The third and final step of the DUDE scheme is to scan again and compute the actual denoised sequence. The denoised symbol chosen to replace is the Bayes response to the two-sided context of the symbol, namely
This step requires operations and used the data structure constructed in the previous step.
In summary, the entire DUDE requires operations and storage.

Asymptotic optimality properties

The DUDE is designed to be universally optimal, namely optimal regardless of the original sequence.
Let denote a sequence of DUDE schemes, as described above, where uses a context length that is implicit in the notation. We only require that and that.

For a stationary source

Denote by the set of all -block denoisers, namely all maps.
Let be an unknown stationary source and be the distribution of the corresponding noisy sequence. Then
and both limits exist. If, in addition the source is ergodic, then

For an individual sequence

Denote by the set of all -block -th order sliding window denoisers, namely all maps of the form with arbitrary.
Let be an unknown noiseless sequence stationary source and be the distribution of the corresponding noisy sequence. Then

Non-asymptotic performance

Let denote the DUDE on with context length defined on -blocks. Then there exist explicit constants and that depend on alone, such that for any and any we have
where is the noisy sequence corresponding to
In fact holds with the same constants as above for any
-block denoiser. The lower bound proof requires that the channel matrix be square and the pair satisfies a certain technical condition.

Background

To motivate the particular definition of the DUDE using the Bayes response to a particular vector, we now find the optimal denoiser in the non-universal case, where the unknown sequence is a realization of a random vector, whose distribution is known.
Consider first the case. Since the joint distribution of is known, given the observed noisy symbol, the unknown symbol is distributed according to the known distribution. By ordering the elements of, we can describe this conditional distribution on using a probability vector, indexed by, whose -entry is. Clearly the expected loss for the choice of estimated symbol is.
Define the Bayes Envelope of a probability vector, describing a probability distribution on, as the minimal expected loss, and the Bayes Response to as the prediction that achieves this minimum,. Observe that the Bayes response is scale invariant in the sense that for .
For the case, then, the optimal denoiser is. This optimal denoiser can be expressed using the marginal distribution of alone, as follows. When the channel matrix is invertible, we have where is the -th column of. This implies that the optimal denoiser is given equivalently by. When and is not invertible, under the reasonable assumption that it has full row rank, we can replace with its Moore-Penrose pseudo-inverse and obtain
Turning now to arbitrary, the optimal denoiser is therefore given by the Bayes response to
where is a vector indexed by, whose -entry is. The conditional probability vector is hard to compute. A derivation analogous to the case above shows that the optimal denoiser admits an alternative representation, namely, where is a given vector and is the probability vector indexed by whose -entry is Again, is replaced by a pseudo-inverse if is not square or not invertible.
When the distribution of is
not available, the DUDE replaces the unknown vector
with an empirical estimate
obtained along the noisy sequence itself, namely with
. This leads to the
above definition of the DUDE.
While the convergence arguments behind the optimality properties above are more
subtle, we note that the above, combined with the
Birkhoff Ergodic Theorem, is enough to prove that for a stationary ergodic source, the DUDE with context-length is asymptotically optimal all -th order sliding window denoisers.

Extensions

The basic DUDE as described here assumes a signal with a one-dimensional index
set over a finite alphabet, a known memoryless
channel and a context length that is fixed in advance. Relaxations of each of these
assumptions have been considered in turn. Specifically:

Application to image denoising

A DUDE-based framework for grayscale image denoising achieves state-of-the-art denoising for impulse-type noise channels, and good performance on the Gaussian channel. A different DUDE variant applicable to grayscale images is presented in.

Application to channel decoding of uncompressed sources

The DUDE has led to universal algorithms for channel decoding of uncompressed sources.