Saturday, January 22, 2011

Snowflakes, fractals and performance impact on telco applications


At the end of last century when VoIP just did its first steps into the world, engineers hit on the problem that classic methods gave too optimistic performance results. The system designed and tested in lab at heavy load failed at low and mid load in real network. This phenomenon caused deep research and recent studies have shown the presence of long-range dependence or even fractals (or self-similarity) in teletraffic wich can not be described by traditional Markov's model such as Poisson process.

But what is the fractal and self-simality and why it kills servers? The mathematics behind the fractals began in 17th century with researches of recursive self-similarity by Weierstrass and finally in 1975 Mandelbrot used the word fractal to identify objects whose Hausdorff dimension is greater then its topological dimension.

Instead of digging into the complicated details of the theory let's consider the example - snowflake. Snowflakes are amazing creations of nature. They seem to have intricate detail no matter how closely you look at them. One way to model a snowflake is to use a fractal which is any mathematical object showing "self-similarity" at all levels known as Koch snowflake.

The Koch snowflake is constructed as follows. Start with a line segment. Divide it into 3 equal parts. Erase the middle part and substitute it by the top part of an equilateral triangle. Now, repeat this procedure for each of the 4 segments of this second stage. See Figure 1. If you continue repeating this procedure, the curve will never self-intersect, and in the limit you get a shape known as the Koch snowflake.


Amazingly, the Koch snowflake is a curve of infinite length! And, if you start with an equilateral triangle and do this procedure to each side, you will get a snowflake, which has finite area, though infinite boundary!

Let's leave the question why self-simlarity appears without answer at this moment because it is not simple question and try to understand why fractals are so dangerous for telco applications using the dry theory. So what we know is that the distribution differs from normal and it varies. Now let's imagine that at some point system meets with "problem" where problem is caused by unsuccessful combination of many parameters (number of messages arrived, the time distance between them, unexpected logical relation between messages, etc). Self-similarity means that this problem will be occured infinite number of times just with different scales. So if problem can happen only once it will return again and again and again... It explains why performance in lab always greater the real one, and mistake can be like 100 times or even infinity.

Of cource would be inerested to understand the physics of this process. Why self similarity appears? This questions bothers many peoples and since the pioneering work on self-similarity of network traffic by Leland, many studies have attempted to determine the cause of this phenomenon. Initial efforts focused on application factors. For example, Crovella and Bestavros investigated the cause of self-similarity by focusing on the variability in the size of the documents transferred and the inter-request time. They proposed that the heavy-tailed distribution of file size and “user think time” might potentially be the cause of self-similarity found in Web traffic.

Alternatively, a few studies have considered the possibility that underlying network protocols such as TCP could cause or exacerbate the phenomenon. In particular, Peha first showed that simple ARQ mechanisms could cause the appearance of self-similarity in congestible networks, but he did not examine the ARQ mechanism in TCP. Veres later showed that TCP could sometimes create self-similarity in an individual TCP stream. Interestingly, in some circumstances, aggregate traffic through bottleneck tends toward Poisson while individual streams remain self-similar, presumably because congestion control mechanisms tend to keep the aggregate throughput close to the capacity whenever load exceeds the capacity. However, the work was based on the assumption that load is infinite (heavy load), which is obviously not sustainable in real networks.

In particular, when load is low and loss is rare, traffic looks Poisson. When load is high and the network is overloaded, TCP congestion control can smooth out the burstiness of the aggregate stream so that traffic at the bottleneck tends to Poisson. However, when load is intermediate and the network is prone to occasional bouts of congestion, as is typical of many networks, traffic can become self-similar. Moreover, factors such as round trip time and number of streams passing through the bottleneck can cause the network to become congested at different loads, and consequently affect the range of load over which self-similarity can be observed.

The high level signalling protocols are also affected by self-similarity (this is the same "packet" switched traffic just in bigger scale). Circuit switched telephony was looking more or less stable in this zoo till resent studies detected self-simaliry in global SS7 network where again signalling messages transmitted over data links becomes packets in packet switched network.

Resuming everything said above would be nice to add that "self-similarity" can be measured. The theory defines Hurst parameter wich varies in range [0-1]. The value H = 0,5 relates to pure Markov's source, H less then 0,5 means that process is not self-similar and H greater then 0.5 indicates that process has self-silmilar behaivor.


No comments:

Post a Comment