Self-similar process
Self-similar processes are types of stochastic processes that exhibit the phenomenon of self-similarity. A self-similar phenomenon behaves the same when viewed at different degrees of magnification, or different scales on a dimension. Self-similar processes can sometimes be described using heavy-tailed distributions, also known as long-tailed distributions. Examples of such processes include traffic processes, such as packet inter-arrival times and burst lengths. Self-similar processes can exhibit long-range dependency.
Overview
The design of robust and reliable networks and network services has become an increasingly challenging task in today's Internet world. To achieve this goal,understanding the characteristics of Internet traffic plays a more and more critical
role. Empirical studies of measured traffic traces have led to the wide recognition of
self-similarity in network traffic.
Self-similar Ethernet traffic exhibits dependencies over a long range of time scales. This is to be contrasted with telephone traffic which is Poisson in its arrival and departure process.
In traditional Poisson traffic, the short-term fluctuations would average out, and a graph covering a large amount of time would approach a constant value.
Heavy-tailed distributions have been observed in many natural phenomena including both physical and sociological phenomena. Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena, e.g. Stock markets, earthquakes, climate, and the weather.
Ethernet, WWW, SS7, TCP, FTP, TELNET and VBR video traffic is self-similar.
Self-similarity in packetised data networks can be caused by the distribution of file sizes, human interactions and/ or Ethernet dynamics. Self-similar and long-range dependent characteristics in computer networks present a fundamentally different set of problems to people doing analysis and/or design of networks, and many of the previous assumptions upon which systems have been built are no longer valid in the presence of self-similarity.
The Poisson distribution
Before the heavy-tailed distribution is introduced mathematically, the Poisson process with a memoryless waiting-time distribution, used to model traditional telephony networks, is briefly reviewed below.Assuming pure-chance arrivals and pure-chance terminations leads to the following:
- The number of call arrivals in a given time has a Poisson distribution, i.e.:
- The number of call departures in a given time, also has a Poisson distribution, i.e.:
- The intervals, T, between call arrivals and departures are intervals between independent, identically distributed random events. It can be shown that these intervals have a negative exponential distribution, i.e.:
The heavy-tail distribution
A distribution is said to have a heavy tail ifOne simple example of a heavy-tailed distribution is the Pareto distribution.
Modelling self-similar traffic
Since packetised traffic exhibits self-similar or fractal characteristics, conventional traffic models do not apply to networks which carry self-similar traffic.With the convergence of voice and data, the future multi-service network will be based on packetised traffic, and models which accurately reflect the nature of self-similar traffic will be required to develop, design and dimension future multi-service networks.
Previous analytic work done in Internet studies adopted assumptions such as exponentially-distributed packet inter-arrivals, and conclusions reached under such assumptions may be misleading or incorrect in the presence of heavy-tailed distributions.
Deriving mathematical models which accurately represent long-range dependent traffic is a fertile area of research.
Self-similar stochastic processes modeled by [Tweedie distributions]
Leland et al have provided a mathematical formalism to describe self-similar stochastic processes. For the sequence of numberswith mean
deviations
variance
and autocorrelation function
with lag k, if the autocorrelation of this sequence has the long range behavior
as k and where L is a slowly varying function at large values of k, this sequence is called a self-similar process.
The method of expanding bins can be used to analyze self-similar processes. Consider a set of equal-sized non-overlapping bins that divides the original sequence of N elements into groups of m equal-sized segments so that new reproductive sequences, based on the mean values, can be defined:
The variance determined from this sequence will scale as the bin size changes such that
if and only if the autocorrelation has the limiting form
One can also construct a set of corresponding additive sequences
based on the expanding bins,
Provided the autocorrelation function exhibits the same behavior, the additive sequences will obey the relationship
Since and are constants this relationship constitutes a variance-to-mean power law, with p=2-d.
Tweedie distributions are a special case of exponential dispersion models, a class of models used to describe error distributions for the generalized linear model.
These Tweedie distributions are characterized by an inherent scale invariance and thus for any random variable Y that obeys a Tweedie distribution, the variance var relates to the mean E by the power law,
where a and p are positive constants. The exponent p for the variance to mean power law associated with certain self-similar stochastic processes ranges between 1 and 2 and thus may be modeled in part by a Tweedie compound Poisson–gamma distribution.
The additive form of the Tweedie compound Poisson-gamma model has the cumulant generating function,
where
is the cumulant function, α is the Tweedie exponent
s is the generating function variable, θ is the canonical parameter and λ is the index parameter.
The first and second derivatives of the CGF, with s=0, yields the mean and variance, respectively. One can thus confirm that for the additive models the variance relates to the mean by the power law,
Whereas this Tweedie compound Poisson-gamma CGF will represent the probability density function for certain self-similar stochastic processes, it does not return information regarding the long range correlations inherent to the sequence Y.
Nonetheless the Tweedie distributions provide a means understand the possible origins of self-similar stochastic processes for reason of their role as foci for a central limit-like convergence effect known as the Tweedie convergence theorem. In nontechnical terms this theorem tells us that any exponential dispersion model that asymptotically manifests a variance-to-mean power law is required to have a variance function that comes within the domain of attraction of a Tweedie model.
The Tweedie convergence theorem can be used to explain the origin of the variance to mean power law, 1/f noise and multifractality, features associated with self-similar processes.
Network performance
Network performance degrades gradually with increasing self-similarity. The more self-similar the traffic, the longer the queue size. The queue length distribution of self-similar traffic decays more slowly than with Poisson sources.However, long-range dependence implies nothing about its short-term correlations which affect performance in small buffers. Additionally, aggregating streams of self-similar traffic typically intensifies the self-similarity rather than smoothing it, compounding the problem.
Self-similar traffic exhibits the persistence of clustering which has a negative impact on network performance.
- With Poisson traffic, clustering occurs in the short term but smooths out over the long term.
- With self-similar traffic, the bursty behaviour may itself be bursty, which exacerbates the clustering phenomena, and degrades network performance.
- Cell/packet loss and queue overflow
- Violation of delay bounds e.g. in video
- Worst cases in statistical multiplexing