christop wrote:
I suspect the kernel uses the same mechanism to expand the stack that it uses to detect when a process touches a page that is not swapped in—with page faults. When a process touches a page below the stack that hasn't been mapped to the process, the kernel expands the stack, allocates real memory for those page(s), and maps those new pages to the process. A similar thing happens with the heap (where the upper limit is defined by brk()).
The setrlimit(2) man page even mentions automatic stack expansion and how some of the limits can limit this expansion:
setrlimit is a POSIX thing, not necessarily a Linux thing. Linux isn't actually 100% POSIX compatible.
I like how you reject pthreads saying it was designed to support various implementations, then turn around and reference a POSIX man page. You realize pthreads is POSIX threads, right? Same standard.
Quote:
A similar thing happens with the heap (where the upper limit is defined by brk()).
Uh, no. You have to ask for heap space. You can't just start using some random memory address and have the kernel just give it to you. Because it won't.
Quote:
With setrlimit(RLIMIT_STACK, ...), a process can make its maximum stack size smaller or larger.
Yes, but far more importantly, if you *ever* write code this dependent on how a specific kernel implements a stack and how you can manipulate it, you are an insanely bad programmer.
Moreover, you still haven't explained why you would ever attempt to do this. It certainly isn't for performance reasons, because setrlimit isn't exactly a fast call - malloc is going to be far more optimized.
Not to mention you are talking about easily exceeding page sizes of data on the stack - which causes a page fault, making it *no faster than using malloc*
Quote:
Pthreads on Linux probably set the maximum stack size of threads using setrlimit(RLIMIT_STACK, ...).
Nope. Pthreads only affects the thread created with that attr you set.
Quote:
You may have a valid point here. However, if a program is large and/or complex enough to warrant using multiple threads, then memory allocation will have more subtle issues than deciding whether to store a large block of memory on the stack or on the heap. Most programs in Unix/Linux are single-threaded, though, so this doesn't become a factor for most programs.
Most programs in Unix/Linux are single-threaded? Bwahahahaha, no, no they aren't. Most programs are multi-threaded. The vast majority are multithreaded. Sure, those command line utils aren't, but there aren't that many of them.
Threading is *insanely* critical in modern systems, even for programs that don't appear threaded (as in, they don't use much CPU).
You *cannot* ignore threading, it is hugely important and continues to become more and more important as time goes on.
Quote:
You may have a point here too. Then again, gcc has the option -fstack-check which will cause a SIGSEGV upon stack overflow (which you can catch in a signal handler by using sigaltstack()).
You still have no way of gracefully handling that situation. You can either crash hard or crash with a nice error message, but you're pretty much hosed if you run out of stack, whereas in a surprising number of cases you can gracefully recover from not enough heap.
Quote:
That may be so (and I was aware that CPU caches are small), but some allocation sizes and memory access patterns may be faster by putting it on the heap, and some other configurations may be faster by putting it on the stack. If the process is accessing at least one byte in each page of a >=256K block of memory, it doesn't matter much whether it's on the stack or on the heap. The current stack page (the one that the stack pointer points to) is probably going to be flushed from the cache either way.
Ah, but you are missing something very important - the CPU knows about the stack. It knows where the stack pointer currently points. It is very easy for the CPU to just not flush the page with the stack from its cache. So yes, it very much does matter where you allocate memory in terms of the cache. And L1 cache is the important one here. L2 is like 3x slower than L1.
Quote:
After all is said and done, I would personally put up to a few kilobytes on the stack in each stack frame, but I don't worry if I occasionally put more than that.
And we come full circle - if you are putting that much data on your stack, your design is fundamentally wrong.
In the case of this thread, it absolutely seems like a case of "a little knowledge is a dangerous thing". Particularly you seem convinced that the stack is faster than the heap - which stops being true as soon as you start using the stack as a heap. Go do some actual testing, get some real numbers.