So I’ve been playing with Python and Arduino’s idea of C++ for a few years now, and this year have started easing my way into pure C, mostly wanting to work with microcontrollers. It’ll probably be a while before I’m handling memory management issues, but I’ve got a couple questions from the Stack vs Heap memory section of the tutorial I’m working through. Here goes:
So I understand that all global variables in a C program are Heap memory? Is this true in microcontrollers as it is in a PC, even though I’ve never declared a specific pointer for those variables?
Is this also true of Static variables?
Heap and Stack are both in RAM right? Are they divided up in fixed sizes within that amount of RAM, or is it all available to both until memory is allocated to a given variable?
Would Heap memory be the (usually) better place to store larger variables such as arrays, given that they can be resized? I guess I’m thinking mostly in the context of receiving data from a source that might be arbitrary in length, such as a message entered from a keyboard.
Ok, that’s all I can think of for now, I’m sure I’ll have more later on…
A few answers from my current understanding (beware, I am not a C god):
Globals are not in “heap” - they’re in what’s called BSS. BSS is usually allocated in the region of memory where HEAP is also allocated, but BSS is not HEAP per se. HEAP is what’s left over from that region to dynamically allocate later on during program execution.
Static variables are allocated exactly as globals, but their resolution scope is limited.
Depending on which system you are on, heap and stack may be in RAM or in Flash. Stack is almost always RAM, but some systems don’t have any RAM per se at all and have only Flash with an access accelerator of some kind. This is set up by the linker script and any special designations on the regions themselves which might affect their assignment.
This one has two main components:
1 Main reason you’d choose heap versus stack is how big it is, not whether it’s resized or not. Stack has a usually very limited size on embedded systems.
2 The second main reason you’d choose heap versus stack is the lifetime of the validity of the memory. Stack memory is invalid as soon as you go out of scope (e.g. leave your function call level, switch tasks, etc). Heap memory is generally accessible until you explicitly free it. This is because the stack pointer constantly changes as you go through the various function calls and context switches, but the heap doesn’ t have a pointer, it’s just a blank array of memory that you can organize however you like (usually via sequential requests to malloc or some local variant thereof and returning it, if you have that feature, via free() for later re-allocation. Changing the size of something (via realloc) does not guarantee you will get the same original pointer back, if you don’t, of course, the original pointer is no longer valid for your use. If you’ll be passing a pointer outside your function scope (e.g. setting a global pointer variable to it, or retaining it for later processing in a queue or some other deferment mechanism) you have no choice but to allocate it on the heap or have it preallocated as a global or static variable (NOT a global or static pointer! Those only allocate enough space for the pointer itself, not what it points to!).
It would be a good idea to read up on how the stack and stack pointer work, and what exactly happens during a context switch (e.g. an ISR call is a good example). Your processor programming reference will have this information in general. This is common to all languages, but C makes heap allocation quite explicit (you have to call malloc or a local equivalent), whereas stack allocation is implicit: sometype foo; inside a function is sufficient.
Do you know how to print the address of a variable in C?
printf("Address of a: %lx\n", (uint64_t)&a);
Something like that ought to do it. You can mess around with the different address regions memory is allocated in. On my current system, a simple little program outputs the following:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
void main(void)
{
int a;
char b;
char c;
int d;
void *f;
void *g;
f = malloc(0x1000);
g = malloc(0x1000);
printf("Address of a: %016lx\n", (uint64_t)&a);
printf("Address of b: %016lx\n", (uint64_t)&b);
printf("Address of c: %016lx\n", (uint64_t)&c);
printf("Address of d: %016lx\n", (uint64_t)&d);
printf("Address of f: %016lx\n", (uint64_t)f);
printf("Address of g: %016lx\n", (uint64_t)g);
}
Outputs:
Address of a: 0000fffff96c6100
Address of b: 0000fffff96c60fe
Address of c: 0000fffff96c60ff
Address of d: 0000fffff96c6104
Address of f: 0000aaaade5112a0
Address of g: 0000aaaade5122b0
Or, on a different platform:
Address of a: 00007ffcef0f8450
Address of b: 00007ffcef0f844e
Address of c: 00007ffcef0f844f
Address of d: 00007ffcef0f8454
Address of f: 000055ea6a5f3260
Address of g: 000055ea6a5f4270
I’m simply printing the addresses of the variables and allocations. a, b, c, and d are stack allocations, f and g are heap allocations, and e is busy making superhero outerwear.
Vertiginous is correct here - globals live in a separate space from heap memory.
Static is a badly overloaded keyword in C. It does different things inside and outside of a function scope. Outside a function scope, it means that the variable/function is only visible inside the current file - not externally. This is nice for containment.
Inside a function, though, it effectively means a variable is a global - not a stack variable. And as it’s determined at compile time, it would be in the BSS or data segment.
#include <stdio.h>
void func(void)
{
static int a = 0;
printf("A = %d\n", a);
a++;
}
void main(void)
{
for (int i = 0; i < 10; i++)
{
func();
}
}
A = 0
A = 1
A = 2
A = 3
A = 4
A = 5
A = 6
A = 7
A = 8
A = 9
Normally, yes, they’re both in RAM, unless you’re on a particularly weird platform. On a microcontroller, you can typically assume that you’re accessing physical memory and that the limits of physical memory constrain you. On a platform with virtual memory, this isn’t true, and your virtual address space within the program can exceed the physical RAM by some margin, depending on how you do things. You can map files into your memory space and access them that way, there’s swap… it’s complex. But for embedded, typically, heap grows up, stack grows down, and when the two collide all sorts of nasty things happen that usually bite long after you’ve stomped something. Play with address printing on an Arduino, and you’ll see it happen - I don’t happen to have one laying around at the moment in the house or I’d write a quick demo. Exercise for the reader.
Yes, heap is the right place for that. However, on an embedded platform, be very careful with dynamic memory allocation. It’s very easy to fragment memory - if you have holes, you might not be able to actually pack memory, and so you can “run out of memory” (crash your heap and stack) even though your allocations don’t exceed physical memory.
I’ll suggest that for embedded programming, if you’re doing runtime allocations (instead of static allocation and init-time allocation), you’re well on your way to having a bad time of things. It’s possible, but you have to be very careful.
Awesome, thanks guys, that’s a bunch I’ll do more reading about. Just finished the ‘Memory layout of C program’, I should probably read it again in the morning. I’ll start playing with printing addresses with the Arduino too, probably after the ST webinar tomorrow.
It is tempting to write “universal” or “super flexible” code which can handle any size data, especially when coming from other programming contexts. However, with embedded systems it is better to have well-defined limitations in your code than it is to hit opaque hardware/language-runtime limitations which will cause unpredictable-hard-to-debug crashes.
I think I’ve run into that testing some of my Lora radio projects. The library I use had it’s receiver example setup to just add incoming data to a String object. Some problems happened with hangups and reboots when I didn’t make checks about packet lengths and recipient addresses. Instead of a string, I think I made things a little better by building an array instead:
void receiving(int packetSize)
{
if (packetSize == 0) return;
int i = 0;
int recipient = LoRa.read();
if (recipient != localAddress) return; //message isn't for us, quit parsing
byte sender = LoRa.read();
//Serial.print("sender: "); Serial.println(sender);
byte incomingMsgId = LoRa.read();
byte incomingLength = LoRa.read();
char incomingArray[incomingLength];
while (LoRa.available())
{
incomingArray[i] = (char)LoRa.read();
i++;
}
At this point all by projects have been receiving data of known length (usually a few chars or an array of booleans corresponding to relay states); but if I was going to receive data of arbitrary length (like a text message, where everything would have to be stored at once and displayed), I suppose I should move incoming array to heap? Then I can know the space after the first element is free, up to whatever maximum message length I specify, and truncate the rest.
Even then, chances are that you aren’t receiving data “of arbitrary length” in any case - just data that you haven’t defined an upper bound for yet. If you don’t do this, there will be some implicit limit anyway but the handling of hitting that limit will probably be much less graceful.
I’d go back to the problem you’re trying to solve and see what the maximum you reasonably might need to hold in memory at any one time is.
Maybe you’re forwarding the data on via some sort of stream-based interface Serial/TCP/etc - can you pass the data directly on in chunks without buffering it all into memory?
Maybe you’re displaying data on a screen - how many screenfuls of data at what density does the application really need to hold in memory?
Maybe you’re computing a digest of the data to pass on to whatever the next step is - can you break the computation down and perform it incrementally as the data is processed in chunks?
That would depend on the lifecycle of the message and what the code that uses it does. Suppose you allocated your message object on the heap - for the sake of the example let’s suppose your message object is 1/2 the total heap space. Then suppose the code passes the message pointer to some logic to print it. If that logic allocated a tiny object on the heap which was longer lived - perhaps a node in a dynamic data structure like a linked list, then suddenly there’s an object left over in the middle of the heap memory. If the next message is 3/4 the size of the total heap space, there wouldn’t be space for it - even though the first message pointer was already freed - since there isn’t a contiguous space in the heap large enough.
The above is an extreme case that fails in just two message processing iterations, but the same thing can also happen more slowly with smaller heterogeneous object allocations on the heap - as long as enough of the heap allocations live long enough to break up the memory into unusably small pieces.
That’s a great way to fragment heap. Strings are very useful on “big” platforms because they just realloc and do the right thing, but if you’ve got 2kb of SRAM to work in, that’s pretty much the wrong thing, every time. It works often enough that people get away with it, but it’s non-trivial to create patterns that fragment the heap.
Gross. I wasn’t sure if that actually ended up on the stack or in the heap, so I wrote a test program. At least on Linux, it looks stack-ish…
#include <stdio.h>
int main(int argc, char *argv[])
{
char a;
int foo = (argc == 2) ? atoi(argv[1]) : 500;
printf("Address of a: %lx\n", &a);
printf("Address of foo: %lx\n", &foo);
char bar[foo];
char *baz = malloc(400);
printf("Address of bar: %lx\n", bar);
printf("Address of baz: %lx\n", baz);
}
Address of a: 7ffc986b3abb
Address of foo: 7ffc986b3abc
Address of bar: 7ffc986b38a0
Address of baz: 5652913cd6b0
Still gross…
The simple answer is, “You don’t do that in embedded programming.” You would “receive up to 64 bytes of text message.” And then carefully ignore the extra - no buffer overflows, those end poorly too.
You should probably sanity check incomingLength, though it’s limited to 255 by the nature of a byte.
Runtime is another potential limitation it could be more pleasant to debug when explicit rather than implicit. When processing data of arbitrary length, it’s even harder to make strong guarantees about how long something will take. That tends to matter when interacting with hardware.