I have recently been working on improving the stability of uCasterisk, a port of Asterisk to uClinux. This required some research into memory management for multi-threaded apps on uClinux. I didn’t find any one resource that had everything I needed to know so I thought I would collate some of the information I found here as a resource for others. Thanks to all those (especially on the Blackfin forums) who helped answer my questions.
I am using the Blackfin flavour of uClinux and the uCasterisk application as an example, but this information should apply equally to other uClinux systems/applications.
MMU versus no-MMU
Asterisk is a pretty big application for uClinux, the executable is about 2.5M and when running several calls can consume 32M of system memory. The big difference between uCasterisk and other Asterisk implementations is the lack of MMU. A MMU is handy when working with large, multi-threaded apps. For example when a thread is kicked off you can allocate a virtual stack of say 2M, but physical memory will only be allocated as and when it is actually required (say due to a write to a previously unused part of the stack). If your thread never uses all of the stack, then the physical memory is available for other users.
On a MMU-less system you need to work out the maximum stack your thread may need, and allocate that. If you get it wrong, your application (and possibly the whole system) will bomb. This generally means you are wasting memory compared to the MMU case, as you always need to allocate the worst case amount of memory required.
One possible advantage of MMU-less systems is no nasty surprises - any memory allocated really does exist, and no over-commitment is possible. On a MMU-based system physical memory isn’t actually allocated until you write to it, and it may be paged to disk just when you need it (although I understand there are options to control this behaviour).
Stacks for Threads
When you start an app, you get allocated a stack for the application. This is actually a stack for the main thread of the application. When you start a new thread (say with pthread_create()) the thread gets allocated a new stack from the system heap. The two stacks are completely unrelated. The size of each stack is independent, you control the size in different ways (see below).
Tips for Porting to uClinux
Don’t enable stack checking. This feature is very useful for single-threaded apps; it causes the operating system to kill the app when it uses all of it’s stack space. Very useful, as it tells you straight away to increase the stack size. Unfortunately at present this feature hasn’t been extended to multi-thread applications; using it with multi-threaded apps (at least on the Blackfin) causes problems as pointed out in the 2005R4 RC2 release notes and discussed here.
You control the application (main thread) stack with the -s option, on my Blackfin system the command line is:
bfin-uclinux-gcc -Wl,-elf2flt='-s 1000000' \
-o thread thread.c -pthread
In this example the stack is set to 1000000 bytes.
You control the size of the stack for each thread you create using pthread_attr_setstacksize(), for example (from the Asterisk utils.c file):
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 0×1000000);
pthread_create(&thread, &attr, thread_func, NULL);
Monitoring Memory Usage
cat /proc/meminfo can be very useful, here is the output from my Blackfin STAMP BF533 board, taken while uCasterisk was running with several SIP calls in progress:
root:/var/log/asterisk> cat /proc/meminfo
MemTotal: 59784 kB
MemFree: 11084 kB
Buffers: 100 kB
Cached: 4172 kB
SwapCached: 0 kB
Active: 3828 kB
Inactive: 444 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 59784 kB
LowFree: 11084 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 4 kB
Writeback: 0 kB
Mapped: 0 kB
Slab: 43744 kB
CommitLimit: 29892 kB
Committed_AS: 0 kB
PageTables: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
The most important fields are MemFree (total system memory free) and Slab (system wide heap in use).
In earlier versions of Linux the CommitLimit field indicated the maximum Slab was allowed to reach before processes were killed (with Out-Of-Memory errors). However on my distro I discovered by experiment that you can actually increase the Slab well beyond this limit, as indicated above. Looking at the kernel source file uClinux-dist/linux-2.6.x/mm/nommu.c, __vm_enough_memory() function it appears that the memory allocator uses the OVERCOMMIT_GUESS method, which ignores the CommitLimit and allows up to 97% of memory to be allocated.
It is interesting to observe MemFree as you perform different operations. For example on uCasterisk when a new SIP call starts, a thread is created, which requires stack and heap space. I also noticed MemFree decreasing when I copied files on a ram file system - this caught me for a while as uCasterisk was chewing through available system memory writing Call Data Records to the ram disk and eventually causing Out of Memory errors.