Wednesday, November 30, 2011

The new release of Mobicents media server

The new version of Mobicents media server has been released. This is a minor release for known bug fixes like RTP event conversation and connection state machine. Documentation (finaly) was updated however still far from ideal. Here valuable feedback on this subject will help to improve.

For donwload the binary use the following link: Binary release

This is a last released version based only on Global EDF scheduler, for next releases we are investigating two options:

- a hybrid of Global EDF and partiotioned EDF with First-Fit heuristic. G-EDF has better result for non-preemtive tasks then partitioning version but its capacity is limited by shared priority queue. Thus the combination of global and partitioning may solve both problems: preemtion and capacity
- Eearly Gap Eearly deadline first (EG-EDF). The another extension of multiprocessor EDF which (on fingers) tries first to find best gap for the task and if can not find uses EDF. In general, it packs tasks more compact (as non preemptive) what allows to take some spare for WCET in case of non-deterministic behaviour.

The expectations from this method sounds very good but what will happen in real life we will know in March, 2012 when next release is scheduled. However if someone want to be in touch with results earlie or want to help us - welcome.

Media server source code: svn checkout mobicents-read-only

Wednesday, June 22, 2011

Mobicents Media Server 2.1.0.B2 under the hood

The development of Media server on top of standard Java was started 3 years ago and all that time we were trying to avoid implementation of real time scheduler and reach suitable results using Java threads model only. There was several reasons to stick to this way. In addition, for our justification we can say we don't need hard real time system and can agree on a system with small miss rate value and jitter. So it was subject for try... At the early stage when server was running not big set of mostly equivalent tasks the round robin thread pool model was working more or less but with each release server became more and more complex and gain the problem of scheduler more and more as well.

Version 2.1.0.BETA1 was a final and agonized try to make this system working. The development of next version started from the deep analysis of scheduling algorithms and now ended with full server refactoring.

Generally, media server is a system where time critical processes must be executed under timeline limitations which is usually decribed as term deadline. Task must be completed or its execution must start before the deadline. Thus we need first of all to implement task scheduler which will allow us execute time critical tasks.

The global task of real time scheduling is quite important for computer industry so it is well studied in many works. As result, different algorithms of scheduling were designed as well but unfortunately none of such algorithms is available in standard Java by default. Of cource there is an extension called RTSJ (JSR-1) but it is not available for all OSes and it is expensive. Defintely, all known scheduling algorithms are optimal if preemtive what cause high level of CPU control and thus dependency from OS capabilities. So not even any OS can be suitable for implementing real time scheduler. In this sense, implementation of crossplatform such scheduler using Java does not seems real at all. No need to try it even. Instead, let's look is it possible to adopt one of the known algorithm for the media processing task exactly and implement it on Java.

The media processing supposes loses. The encoded and then decoded signal never matches to the original signal but senses of humans are not able to recognize the difference. It means that we can make deadline softer and try to minimize the mistake. This is assumption quite extends the limits.

From the other side, the Java virtual machine actually deattaches us from the real CPUs what allowes us to consider it mostly like uniprocessor system.

These facts bring us to the idea to choose the scheduler with EDF policy as a starting point. We will assign priority to the task and put tasks into the priority queue while JVM will execute one task after another without preemption.

The EDF scheduling is feasible if

where is worst case execution time and is period of i-task. At this stage let's note that media server runs big number of tasks of same type: IO, transcoding, relay, etc. This makes the total analysis more simple and allows to find the strategy of feasible scheduling for all types of media tasks. However since we can not implement preemtion it is impossible to guarantee that such scheduling will be feasible all the time. And it is obvious that error in terms of deadline will be proportional to the following value

where c - is task execution time and N - number of independant threads (or CPUs behind jvm). It is easy to see that the error is reduced with execution time and with increasing number of processors. Or in numbers for 1 millisecond worst case execution time we will have jitter about

Number of processorsjitter,ms

As we can see 4 CPUs gives us already perfect number even without preemption and this estimation perfectly matches to our test results. Awesome! However it is not the end of the story yet. There is still underwater stone - dispersion.

Let's run same tasks a lot of times and check the execution time. At the end of test we will have set of different values grouped around some middle value. It is possible to draw the distribution of the execution time as following diagram

Where is probalistic mean for execution time and dispersion is .

The dispersion is explained by the several facts:
- jvm activity like JIT compiling
- CPU sharing
- Garbage collection
- Object resizing

Let's ignore what is a reason of dispersion and try to understad how is it dangerous for us? Since execution time is random value with some distribution then each member of the equation (1) will be random value as well with same distribution amd then following to Tchebyshev's Central Limit Theorem (CLT) the sum of n such varibales will tend to normal distribution with probabilistic mean n* and dispersion . So dispersion growth like square! (beeeeeeeeeeeeep).

The meaning of above theorem can be demostrated on the following example. Imagine that 10 guys shooting to own target. To be simple lets count that all have aproximately same result: 3 bullets into the midle and other distributed in round with 5" radius. Now the same guys shooting to one taget. The theorem predicts as the 30 bulltes will reach center and remaining will be distributed in the round with 25" already (aproximately). But what if target has radius 10" only? In first case 100% of bullets were in target while in second case 50% of bullets did not fit into target at all.

In case of media server the equation (1) gives the "size" of the target. Or in normal language it means that the probability to exceed the available CPU utilization and hit to not feasible scheduler caused by dispersion growth very quickly and we have to pay extreme attention to reasons why it happens and find ways how to minimize the dispersion.

As mentioned above there are several reasons of dispersion of execution time for any task. Some of them are completely out of our management capabilities and some can be and must be fixed. For others we can propose recommendations like run media server on a dedicated machine or use another way to guarantee the enough amount computitional resources.

For the current release most of the methods were refactored and tested during many hours individualy on subject of dispersion caused by Java itself and GC and for worst case execution time as well. New memory model was introduced to reduce the impact from GC, new text utility classes which alllowes now to improve protocol message handling in handred of thouthands times... It was a long time job but with positive result: Mobicents media server demonstartes strong and stable transmission statistics even compared with native implementations.

In this post I was trying to explain the "global" idea of scheduling used by Mobicents Media server, roadblocks and ways of solving problems. In the upcoming post I will subsequently show test results, stream statistic compared to well known open source products, explain memory model, etc. So stay tuned.

Saturday, January 22, 2011

Snowflakes, fractals and performance impact on telco applications

At the end of last century when VoIP just did its first steps into the world, engineers hit on the problem that classic methods gave too optimistic performance results. The system designed and tested in lab at heavy load failed at low and mid load in real network. This phenomenon caused deep research and recent studies have shown the presence of long-range dependence or even fractals (or self-similarity) in teletraffic wich can not be described by traditional Markov's model such as Poisson process.

But what is the fractal and self-simality and why it kills servers? The mathematics behind the fractals began in 17th century with researches of recursive self-similarity by Weierstrass and finally in 1975 Mandelbrot used the word fractal to identify objects whose Hausdorff dimension is greater then its topological dimension.

Instead of digging into the complicated details of the theory let's consider the example - snowflake. Snowflakes are amazing creations of nature. They seem to have intricate detail no matter how closely you look at them. One way to model a snowflake is to use a fractal which is any mathematical object showing "self-similarity" at all levels known as Koch snowflake.

The Koch snowflake is constructed as follows. Start with a line segment. Divide it into 3 equal parts. Erase the middle part and substitute it by the top part of an equilateral triangle. Now, repeat this procedure for each of the 4 segments of this second stage. See Figure 1. If you continue repeating this procedure, the curve will never self-intersect, and in the limit you get a shape known as the Koch snowflake.

Amazingly, the Koch snowflake is a curve of infinite length! And, if you start with an equilateral triangle and do this procedure to each side, you will get a snowflake, which has finite area, though infinite boundary!

Let's leave the question why self-simlarity appears without answer at this moment because it is not simple question and try to understand why fractals are so dangerous for telco applications using the dry theory. So what we know is that the distribution differs from normal and it varies. Now let's imagine that at some point system meets with "problem" where problem is caused by unsuccessful combination of many parameters (number of messages arrived, the time distance between them, unexpected logical relation between messages, etc). Self-similarity means that this problem will be occured infinite number of times just with different scales. So if problem can happen only once it will return again and again and again... It explains why performance in lab always greater the real one, and mistake can be like 100 times or even infinity.

Of cource would be inerested to understand the physics of this process. Why self similarity appears? This questions bothers many peoples and since the pioneering work on self-similarity of network traffic by Leland, many studies have attempted to determine the cause of this phenomenon. Initial efforts focused on application factors. For example, Crovella and Bestavros investigated the cause of self-similarity by focusing on the variability in the size of the documents transferred and the inter-request time. They proposed that the heavy-tailed distribution of file size and “user think time” might potentially be the cause of self-similarity found in Web traffic.

Alternatively, a few studies have considered the possibility that underlying network protocols such as TCP could cause or exacerbate the phenomenon. In particular, Peha first showed that simple ARQ mechanisms could cause the appearance of self-similarity in congestible networks, but he did not examine the ARQ mechanism in TCP. Veres later showed that TCP could sometimes create self-similarity in an individual TCP stream. Interestingly, in some circumstances, aggregate traffic through bottleneck tends toward Poisson while individual streams remain self-similar, presumably because congestion control mechanisms tend to keep the aggregate throughput close to the capacity whenever load exceeds the capacity. However, the work was based on the assumption that load is infinite (heavy load), which is obviously not sustainable in real networks.

In particular, when load is low and loss is rare, traffic looks Poisson. When load is high and the network is overloaded, TCP congestion control can smooth out the burstiness of the aggregate stream so that traffic at the bottleneck tends to Poisson. However, when load is intermediate and the network is prone to occasional bouts of congestion, as is typical of many networks, traffic can become self-similar. Moreover, factors such as round trip time and number of streams passing through the bottleneck can cause the network to become congested at different loads, and consequently affect the range of load over which self-similarity can be observed.

The high level signalling protocols are also affected by self-similarity (this is the same "packet" switched traffic just in bigger scale). Circuit switched telephony was looking more or less stable in this zoo till resent studies detected self-simaliry in global SS7 network where again signalling messages transmitted over data links becomes packets in packet switched network.

Resuming everything said above would be nice to add that "self-similarity" can be measured. The theory defines Hurst parameter wich varies in range [0-1]. The value H = 0,5 relates to pure Markov's source, H less then 0,5 means that process is not self-similar and H greater then 0.5 indicates that process has self-silmilar behaivor.

Friday, January 21, 2011

Real time scheduling for multimedia server using standard Java

Media Server is the systems in which the operational correctness depends not only on the results of the computation but also on the time at which these results are produced. Time critical processes must be executed under timeliness limitation, which is usually described as a term deadline. Tasks must either be completed or its execution must start by the deadline.

The scheduler is the component of the Media Server that selects which process to run next. The scheduler (or process scheduler, as it is sometimes called) can be viewed as the code that divides the finite resource of processor time between the runnable processes on a server.

Many papers have been published in the field of real-time scheduling. Unfortunately much of the published work consists of scheduling algorithms that are defined for systems with preemptive multitasking like Real Time Operating Systems. In preemptive multitasking, the scheduler decides when a process is to cease running and a new process is to resume running. The act of involuntarily suspending a running process is called preemption. The time a process runs before it is preempted is predetermined, and is calle
d the timeslice of the process. Primitive thread management capabilities of standard Java are making unavailable direct implementation of known algorithms. However it is possible to extract from this published material some general purpose techniques.

Thread model and task queue.

There is no way to distrubute CPU time between threads using the Standard Java API. Normaly it means that we anyway can use published materials but with assumption that number of CPUs equal to 1 and construct scheduler around uniprocessor model. Let's now consider CPUs as a single processor with total power equal to sum of all available CPUs. If all CPUs are equivalent, what we have exactly, then it allows to use the model with dynamic task priorities and EDF policy as base. The following diagram depicts the architecture of scheduler.

Each time when new task arrives Acceptor analysis parameters of the task, determines its priority and delivers task to the "ready queue" where position of the task relates to its priority. The task with high priority resides closer to the head of queue and task with lower priority resides closer to tail.

CPUs executes tasks and calculate feedback as difference between estimates and actual result. Finally Scheduler updates the miss rate counter used for congestion control and updates task priority.

I/O CPU Burst Cycle

Saying about media server it is possible to classify inner processes as "IO bound" or "CPU bound". The server make heavy use of I/O endpoints and spend much time waiting for I/O operations to complete; the latter are number-crunching transcoding tasks that require a lot of CPU time. The execution of a process consists of an alternation of CPU bursts and I/O bursts.

The frequency of I/O operations usage inside Media server is very high, espacially for legacy DS0 channels, what means that I/O actions must fill the whole space between CPU bound tasks.

Test results

The actual implementation is still under strong test but even the first tests shows high stability and very good CPU utilization. Bellow is the diagram of average miss rate as dependency of scheduled periodic tasks with period - 20ms, worst execution time 1ms and max jitter 3ms. Congestion control disabled

The archived miss rate is comparebale with bestes real time schedulers constructed on preemtive OS. This test was running on 4-core CPU and thus 80 tasks scheduled relates to 100% of CPU utilzation. Very good result compared with RMA theoretical maxim of 69,3%,

Monday, October 18, 2010

It happened finally! September 25 I took off from Volgograd airport at 4.00 and after 2.5h landed in the air port of Antalya. One hour later I found Mobicents team in hotel Alva Donna near Belek! It was my first meeting with my team so I was really happy to meet finally everyone.

Big thanks Eduardo for organizing this event when we can mix work and fun, or fun and work. In best traditions the our team work started from big smoke break - we went to rafting

Next day work started. It was a great and productive meetings. We discussed on many interested and important topics. The last year was very important for media server project. We found way how to archive real time requirements for media processing, and implemented SS7 gateway capabilties and finished in front of video.

Presentation was splitted into three parts: Bartek talked about SS7 and Amit focused on general server architecture. My talk was mostly targeted to the physical aspects of media processing and evolution of core engine from version 1.x to 2.x Later I decided to modify slides and make it more Java centric. The modified version can be found here

Outside of the presentation room, durring the chill out time, we also had non stop discussions and since it was my first meeting this time was most interested for me:

Thursday, July 30, 2009

Development Interactive Voice Response service using Mobicents and jBPM

This is my first blog message:)

At the final end I am going to blog about Mobicents Media server and other components of Mobicents platform. How to use it for development of the telco applications. Let's start from one of the widely used telco application

IVR, or Interactive Voice
Response, is one of the most used types of telco application. It automates the interaction (retrieval and input of data) with a database, typically through the use of a touch-tone (DTMF) telephone. Most of us use some type of IVR application every day. Whether it's to check bank balances or transfer funds, manage credit cards, order prescription medicine, verify flight information or check for store hours or locations.

Media server which is responsible for voice and DTMF tone processing is one of the manadatory element of network infrastructure. The modern network suppose that the Media server is a dump thing and is under control from intelligent call controller application which is placed on the signaling way. In other words we can say that Media Server is doing a "black job" for call controller :) But media server still has to provide usefull interfaces for call controller.

All media server work always will be joined with call controller application. Fortunately now the process of writting telco-grade applications is more simple then several years ago. But anyway the telco was traditionaly closed area and is new now for most of the mainstream developers. I think would be interested to start blogging about Mobicents Media server from paper which explains how to write telco application.

So, it is started but not completed yet. Stay turned.