christop wrote:
You can usually allocate objects like this up to a few kilobytes or more without using "new"


You should really never allocate more than 1kb on the stack. If you are using that much on the stack, chances are you are doing something wrong anyway.

comic: Split from Help Loading from a file in C++
Kllrnohj wrote:
christop wrote:
You can usually allocate objects like this up to a few kilobytes or more without using "new"


You should really never allocate more than 1kb on the stack. If you are using that much on the stack, chances are you are doing something wrong anyway.

In the past I've done "char x[500000000L];" in a function with no ill side-effects. I'm not saying it was a good idea, but there's nothing technically wrong with doing it. Smile

I actually did it only to see how Linux would handle such a large stack allocation. Linux handled it with no problem, and since my code didn't touch any of the array's contents, I'm pretty sure Linux didn't allocate real memory for the whole array (which means I could have called the function recursively until the process's address space was exhausted, without consuming memory for other processes to use). I didn't have swap enabled at the time as this was done using a live Linux CD, and the computer had only 256MB or so. Had I written to or read from the whole array (or at least one byte from each page), Linux would have killed the program.

EDIT: oops, double post. Rolling Eyes
christop wrote:
In the past I've done "char x[500000000L];" in a function with no ill side-effects. I'm not saying it was a good idea, but there's nothing technically wrong with doing it. Smile

I actually did it only to see how Linux would handle such a large stack allocation. Linux handled it with no problem, and since my code didn't touch any of the array's contents, I'm pretty sure Linux didn't allocate real memory for the whole array (which means I could have called the function recursively until the process's address space was exhausted, without consuming memory for other processes to use).


No, in that case there absolutely is something wrong with it. The stack is a fixed size. It is a finite block of memory. In the case of "char x[500000000L];", It would fail miserably if you actually tried to use it, or the compiler compensated and you ended up with a *huge* stack or exe.

There is *NOTHING* automatic about the stack. You can *easily* stomp all over it and wreck yourself. In fact, that's exactly how buffer overflow exploits work.
Kllrnohj wrote:
christop wrote:
In the past I've done "char x[500000000L];" in a function with no ill side-effects. I'm not saying it was a good idea, but there's nothing technically wrong with doing it. Smile

I actually did it only to see how Linux would handle such a large stack allocation. Linux handled it with no problem, and since my code didn't touch any of the array's contents, I'm pretty sure Linux didn't allocate real memory for the whole array (which means I could have called the function recursively until the process's address space was exhausted, without consuming memory for other processes to use).


No, in that case there absolutely is something wrong with it. The stack is a fixed size. It is a finite block of memory. In the case of "char x[500000000L];", It would fail miserably if you actually tried to use it, or the compiler compensated and you ended up with a *huge* stack or exe.

It depends on the runtime environment. For example, in Linux (as I mentioned) it works fine. I just tested a program that writes to the array, either forward, backward, or randomly. The size of the stack is limited to the value of "ulimit -s", which was initially 8192 (8MB). I raised it to 2GB and then could make my array almost up to 2GB in size with no problem. I verified that the compiler wasn't doing anything funny behind my back like allocating the array in the bss segment. It was only doing a plain "subl $8000016, %esp" inside my function, as it should.

On the other hand, on systems with no MMU (eg, embedded uClinux systems, or a TI-83+) the stack has a fixed size. On uClinux, a process's stack cannot grow once it starts, but it can be changed in the executable file itself (the stack_size field in the bFLT header), which changes the stack size for the next time the program runs.
Kllrnohj wrote:
There is *NOTHING* automatic about the stack. You can *easily* stomp all over it and wreck yourself. In fact, that's exactly how buffer overflow exploits work.

That's not how a buffer overflow works. A buffer overflow happens when a program writes data outside a buffer. What I'm doing is allocating a buffer and writing within its bounds. The worst that happens in my case is a segmentation violation if the OS refuses to allocate real memory for the stack (and thus the buffer that is on the stack). This is therefore a stack overflow. There is no smashing of return addresses or anything like that which happens in a buffer overflow.
christop wrote:
It depends on the runtime environment. For example, in Linux (as I mentioned) it works fine. I just tested a program that writes to the array, either forward, backward, or randomly. The size of the stack is limited to the value of "ulimit -s", which was initially 8192 (8MB). I raised it to 2GB and then could make my array almost up to 2GB in size with no problem. I verified that the compiler wasn't doing anything funny behind my back like allocating the array in the bss segment. It was only doing a plain "subl $8000016, %esp" inside my function, as it should.


Linux does not dynamically grow/shrink the stack as needed. Mainly because it has *no freaking clue* what you are doing with the stack. There *isn't* a runtime environment. The stack is a *fixed size* and it is allocated *up front*.

You are being incredibly stupid with the stack to avoid using the heap - which in this case is *much* cheaper/faster as the OS can actually be intelligent with how it handles that.

christop wrote:
That's not how a buffer overflow works. A buffer overflow happens when a program writes data outside a buffer. What I'm doing is allocating a buffer and writing within its bounds. The worst that happens in my case is a segmentation violation if the OS refuses to allocate real memory for the stack (and thus the buffer that is on the stack). This is therefore a stack overflow. There is no smashing of return addresses or anything like that which happens in a buffer overflow.


No, what will happen is you will run off the end of the stack and start wreaking havoc in random memory. Either the kernel gives you a ridiculously huge stack, in which case you are being wasteful and slow, or it doesn't give you a ridiculously huge stack, and you run right off the end of it.

In short, stop being willfully retarded.
I'd expect that you'd much sooner run into pages that aren't marked writeable/readable and hit a juicy segfault, personally.
Kllrnohj wrote:
*snip*

Where do I begin? (I know a snipped quote is pointless, but it's fun Smile)

Here is how Linux works. I'm sure you know most if not all of this already, but I'm listing it all here for completeness:


  • Every process gets its own private virtual address space (2^32 bytes on 32-bit systems, ignoring PAE etc).
  • Linux allocates real memory for a process only when the process accesses the memory (this happens per-page, but that detail's not really important at this conceptual level).
  • A process can access any memory in its address space as long as it falls within the limits defined by its data segment size and rlimits (gettable/settable via the ulimit shell command, or the getrlimit()/setrlimit() syscalls).
  • If a process reads from or writes to memory outside of its limits, or writes to read-only memory (eg, text segment), Linux cleanly segfaults that process.


What this all boils down to is that Linux essentially gives every process a ridiculously huge stack (almost the full address space, in fact) but allows admins to place a limit on how much of that stack a process can use. With virtual memory, a huge stack is neither wasteful nor slow (there is a slight performance penalty as the kernel has to handle page faults, but that's the price of virtual memory). With a stack size limit in place, if a process "run[s] off the end of the stack", Linux will segfault it. It won't ever let it "wreak havoc in random memory".

The Linux kernel is a lot more intelligent than you give it credit for.
christop wrote:
Where do I begin?


By admitting how wrong you are for putting *anything* of substantial size on the stack.

Quote:
[*]Every process gets its own private virtual address space (2^32 bytes on 32-bit systems, ignoring PAE etc).


Not quite, but irrelevant (in the case of the Linux kernel you actually only get 3GB of private virtual address space on a 32-bit system. In Windows it is normally 2GB)

Quote:
[*]Linux allocates real memory for a process only when the process accesses the memory (this happens per-page, but that detail's not really important at this conceptual level).


Kind of true, but not important.

Quote:
[*]A process can access any memory in its address space as long as it falls within the limits defined by its data segment size and rlimits (gettable/settable via the ulimit shell command, or the getrlimit()/setrlimit() syscalls).


Not so much.

Quote:
[*]If a process reads from or writes to memory outside of its limits, or writes to read-only memory (eg, text segment), Linux cleanly segfaults that process.


Kind of, but also irrelevant.

Quote:
What this all boils down to is that Linux essentially gives every process a ridiculously huge stack (almost the full address space, in fact)


Completely false. Linux has a fixed stack size of 8mb by default. Stacks are *always* a fixed size.

Quote:
With virtual memory, a huge stack is neither wasteful nor slow (there is a slight performance penalty as the kernel has to handle page faults, but that's the price of virtual memory). With a stack size limit in place, if a process "run[s] off the end of the stack", Linux will segfault it. It won't ever let it "wreak havoc in random memory".


A huge stack has overhead because it has to be swapped out every time a context switch happens (which, fyi, happens thousands of times a second). What you seem to be forgetting is that stacks are per *THREAD*, not per process. They have management costs that the heap does not. You are also banking on *where* the stack is in relation to everything else if you run off the end. If you happen to own the random memory off the end of the stack, you absolutely will wreak havoc in it.

You are also forgetting things like the CPU cache and paging. The bigger your stack gets, the more likely it will be swapped to disk. Allocate 1GB on the stack, and suddenly returning from that function takes tens of milliseconds as it has to fetch the stack from disk. And if you can't keep the relevant section of the stack in the CPU cache, say goodbye to any hope of decent performance. Now your function calls are slower, locals are slower, even just returning is slower.

Moreover, it is simply a retarded thing to do. There is *zero* reason to use the stack in this way. Use the stack for its intended purpose, and use the heap for its intended purpose. Otherwise you are fighting against the system just to entertain your own insanity.

Quote:
The Linux kernel is a lot more intelligent than you give it credit for.


No, I know very well how intelligent the kernel is. You don't seem to, however, as you seem to think the kernel treats the stack the same as it does the heap. FYI, it assumes people use the stack as a stack, and optimizes for that.
Kllrnohj wrote:


Quote:
[*]A process can access any memory in its address space as long as it falls within the limits defined by its data segment size and rlimits (gettable/settable via the ulimit shell command, or the getrlimit()/setrlimit() syscalls).


Not so much.

You failed to show how or why this is not true. My empirical research so far shows that it is true. (I do admit, however, that I left out the distinction between kernel stack and user stack, which affects the proper interpretation of "address space").

Quote:
Completely false. Linux has a fixed stack size of 8mb by default. Stacks are *always* a fixed size.

Nope, this is false. The stack is not a fixed size in Linux. The stack starts at perhaps one or two pages upon process execution (via execve()), and it expands up to a limit, either the address space limit or a resource limit. Where did you pull the 8MB figure from, anyway?

Quote:
A huge stack has overhead because it has to be swapped out every time a context switch happens (which, fyi, happens thousands of times a second).

All memory that belongs to a process and is swapped in must be swapped out (that is, the MMU's page tables modified to point to the incoming process's memory pages) at each context switch. And I believe the scheduling quantum is on the order of 100 times per second, not 1000's of times per second (though it's possible to have more frequent context switches in some scenarios).

Quote:
What you seem to be forgetting is that stacks are per *THREAD*, not per process. They have management costs that the heap does not.

In Linux, each thread is a process that just happens to share its address space and other resources with some other processes.

What management costs does the stack have that other pages belonging to a process do not? What does Linux have to keep track of for each additional page of stack that a process uses which it would not have to keep track of for the heap or other memory segments?

Quote:
You are also banking on *where* the stack is in relation to everything else if you run off the end. If you happen to own the random memory off the end of the stack, you absolutely will wreak havoc in it.

Of course it will wreak havoc on its own memory. The stack is placed at the top of the process's address space. It grows downward while the heap grows upward. If the process needs enough memory on the stack for this to be a problem, it would need just as much on the heap instead, where it would also be a problem for the process.

Quote:
You are also forgetting things like the CPU cache and paging. The bigger your stack gets, the more likely it will be swapped to disk. Allocate 1GB on the stack, and suddenly returning from that function takes tens of milliseconds as it has to fetch the stack from disk. And if you can't keep the relevant section of the stack in the CPU cache, say goodbye to any hope of decent performance. Now your function calls are slower, locals are slower, even just returning is slower.

I did not forget about those. You are forgetting about LRU, or whatever specific page-replacement algorithm that Linux uses. Sure, some pages of the stack may be swapped out, but only the pages that haven't been used recently. The pages that are actively being used are the very least likely to be swapped out (they may be swapped out if the system is thrashing, but that's the same whether this process uses the stack heavily or not).

Regarding the CPU cache, it's the same issue too. Assuming the access patterns of a memory block are similar whether it's stored on the stack or on the heap, the CPU cache should behave practically identically. The CPU cache doesn't (afaik) make any distinction between the stack and the heap. It's all just memory pages to it. (I'm referring to the data cache here; the instruction cache isn't affected, since instructions are fetched from the same pages either way.)

Quote:
Otherwise you are fighting against the system just to entertain your own insanity.

I never claimed to be sane, but I'm not really fighting against the system.
christop wrote:
Nope, this is false. The stack is not a fixed size in Linux. The stack starts at perhaps one or two pages upon process execution (via execve()), and it expands up to a limit, either the address space limit or a resource limit. Where did you pull the 8MB figure from, anyway?


Uh... what? Why do you think the stack can grow? More importantly, *HOW* do you think the stack can grow? I assure you, it's a fixed size. When your process is loaded, it is given a stack of size X - that size *never* changes throughout the life of that process. It doesn't grow and it doesn't shrink.

You can pretty easily figure out the default stack size using pthreads: https://computing.llnl.gov/tutorials/pthreads/#Stack

Quote:
All memory that belongs to a process and is swapped in must be swapped out (that is, the MMU's page tables modified to point to the incoming process's memory pages) at each context switch. And I believe the scheduling quantum is on the order of 100 times per second, not 1000's of times per second (though it's possible to have more frequent context switches in some scenarios).


Context switches are not limited to process switches (threads do exist, you know).

Quote:
In Linux, each thread is a process that just happens to share its address space and other resources with some other processes.


True-ish, but that also changes exactly nothing of what I said. A process that shares its address space with other process but not its stack are, *gasp*, THREADS! Rolling Eyes

Quote:
Of course it will wreak havoc on its own memory. The stack is placed at the top of the process's address space. It grows downward while the heap grows upward. If the process needs enough memory on the stack for this to be a problem, it would need just as much on the heap instead, where it would also be a problem for the process.


The key difference being you know when you run out of heap, you have no freaking clue when you run out of stack.

Quote:
I did not forget about those. You are forgetting about LRU, or whatever specific page-replacement algorithm that Linux uses. Sure, some pages of the stack may be swapped out, but only the pages that haven't been used recently. The pages that are actively being used are the very least likely to be swapped out (they may be swapped out if the system is thrashing, but that's the same whether this process uses the stack heavily or not).


In the case where you are allocating a gig or more on the stack, the page with the return address *IS* going to be the least recently used.

Quote:
Regarding the CPU cache, it's the same issue too. Assuming the access patterns of a memory block are similar whether it's stored on the stack or on the heap, the CPU cache should behave practically identically. The CPU cache doesn't (afaik) make any distinction between the stack and the heap. It's all just memory pages to it. (I'm referring to the data cache here; the instruction cache isn't affected, since instructions are fetched from the same pages either way.)


Yes, but the CPU cache is tiny. If you are putting data in your stack, you just pushed the useful page right out of the cache. Sandybridge, Intel's latest and greatest, has an L1 cache of 32kb and a L2 cache of 256kb. Teeny tiny.
Kllrnohj wrote:
christop wrote:
Nope, this is false. The stack is not a fixed size in Linux. The stack starts at perhaps one or two pages upon process execution (via execve()), and it expands up to a limit, either the address space limit or a resource limit. Where did you pull the 8MB figure from, anyway?


Uh... what? Why do you think the stack can grow? More importantly, *HOW* do you think the stack can grow? I assure you, it's a fixed size. When your process is loaded, it is given a stack of size X - that size *never* changes throughout the life of that process. It doesn't grow and it doesn't shrink.

I suspect the kernel uses the same mechanism to expand the stack that it uses to detect when a process touches a page that is not swapped in—with page faults. When a process touches a page below the stack that hasn't been mapped to the process, the kernel expands the stack, allocates real memory for those page(s), and maps those new pages to the process. A similar thing happens with the heap (where the upper limit is defined by brk()).

The setrlimit(2) man page even mentions automatic stack expansion and how some of the limits can limit this expansion:
setrlimit(2) wrote:
RLIMIT_AS
The maximum size of the process's virtual memory (address space)
in bytes. This limit affects calls to brk(2), mmap(2) and
mremap(2), which fail with the error ENOMEM upon exceeding this
limit. Also automatic stack expansion will fail (and generate a
SIGSEGV that kills the process if no alternate stack has been
made available via sigaltstack(2)). Since the value is a long,
on machines with a 32-bit long either this limit is at most 2
GiB, or this resource is unlimited.
...
RLIMIT_STACK
The maximum size of the process stack, in bytes. Upon reaching
this limit, a SIGSEGV signal is generated. To handle this sig‐
nal, a process must employ an alternate signal stack (sigalt‐
stack(2)).

(Emphasis mine.) So there you go. The stack can expand automatically up to the maximum size defined by RLIMIT_STACK and RLIMIT_AS, whichever limit is reached first.

With setrlimit(RLIMIT_STACK, ...), a process can make its maximum stack size smaller or larger.

Quote:
You can pretty easily figure out the default stack size using pthreads: https://computing.llnl.gov/tutorials/pthreads/#Stack

POSIX Threads were designed to support various implementations, some of which support only fixed stack sizes. Linux is not one of those implementations. Pthreads on Linux probably set the maximum stack size of threads using setrlimit(RLIMIT_STACK, ...).

Quote:
Quote:
All memory that belongs to a process and is swapped in must be swapped out (that is, the MMU's page tables modified to point to the incoming process's memory pages) at each context switch. And I believe the scheduling quantum is on the order of 100 times per second, not 1000's of times per second (though it's possible to have more frequent context switches in some scenarios).


Context switches are not limited to process switches (threads do exist, you know).

Quote:
In Linux, each thread is a process that just happens to share its address space and other resources with some other processes.


True-ish, but that also changes exactly nothing of what I said. A process that shares its address space with other process but not its stack are, *gasp*, THREADS! Rolling Eyes

You may have a valid point here. However, if a program is large and/or complex enough to warrant using multiple threads, then memory allocation will have more subtle issues than deciding whether to store a large block of memory on the stack or on the heap. Most programs in Unix/Linux are single-threaded, though, so this doesn't become a factor for most programs.

Quote:
Quote:
Of course it will wreak havoc on its own memory. The stack is placed at the top of the process's address space. It grows downward while the heap grows upward. If the process needs enough memory on the stack for this to be a problem, it would need just as much on the heap instead, where it would also be a problem for the process.


The key difference being you know when you run out of heap, you have no freaking clue when you run out of stack.

You may have a point here too. Then again, gcc has the option -fstack-check which will cause a SIGSEGV upon stack overflow (which you can catch in a signal handler by using sigaltstack()).

Quote:
Quote:
I did not forget about those. You are forgetting about LRU, or whatever specific page-replacement algorithm that Linux uses. Sure, some pages of the stack may be swapped out, but only the pages that haven't been used recently. The pages that are actively being used are the very least likely to be swapped out (they may be swapped out if the system is thrashing, but that's the same whether this process uses the stack heavily or not).


In the case where you are allocating a gig or more on the stack, the page with the return address *IS* going to be the least recently used.

Quote:
Regarding the CPU cache, it's the same issue too. Assuming the access patterns of a memory block are similar whether it's stored on the stack or on the heap, the CPU cache should behave practically identically. The CPU cache doesn't (afaik) make any distinction between the stack and the heap. It's all just memory pages to it. (I'm referring to the data cache here; the instruction cache isn't affected, since instructions are fetched from the same pages either way.)


Yes, but the CPU cache is tiny. If you are putting data in your stack, you just pushed the useful page right out of the cache. Sandybridge, Intel's latest and greatest, has an L1 cache of 32kb and a L2 cache of 256kb. Teeny tiny.

That may be so (and I was aware that CPU caches are small), but some allocation sizes and memory access patterns may be faster by putting it on the heap, and some other configurations may be faster by putting it on the stack. If the process is accessing at least one byte in each page of a >=256K block of memory, it doesn't matter much whether it's on the stack or on the heap. The current stack page (the one that the stack pointer points to) is probably going to be flushed from the cache either way.

After all is said and done, I would personally put up to a few kilobytes on the stack in each stack frame, but I don't worry if I occasionally put more than that. With multi-threaded programs I'd be more careful about stack allocations, just as I am when using the kernel stack in my OS (the kernel stack is only 1KB, which should be plenty for 5 or more nested function calls with modest stack frame sizes plus the stack frames of all interrupt handlers simultaneously).
christop wrote:
I suspect the kernel uses the same mechanism to expand the stack that it uses to detect when a process touches a page that is not swapped in—with page faults. When a process touches a page below the stack that hasn't been mapped to the process, the kernel expands the stack, allocates real memory for those page(s), and maps those new pages to the process. A similar thing happens with the heap (where the upper limit is defined by brk()).

The setrlimit(2) man page even mentions automatic stack expansion and how some of the limits can limit this expansion:


setrlimit is a POSIX thing, not necessarily a Linux thing. Linux isn't actually 100% POSIX compatible.

I like how you reject pthreads saying it was designed to support various implementations, then turn around and reference a POSIX man page. You realize pthreads is POSIX threads, right? Same standard.

Quote:
A similar thing happens with the heap (where the upper limit is defined by brk()).


Uh, no. You have to ask for heap space. You can't just start using some random memory address and have the kernel just give it to you. Because it won't.

Quote:
With setrlimit(RLIMIT_STACK, ...), a process can make its maximum stack size smaller or larger.


Yes, but far more importantly, if you *ever* write code this dependent on how a specific kernel implements a stack and how you can manipulate it, you are an insanely bad programmer.

Moreover, you still haven't explained why you would ever attempt to do this. It certainly isn't for performance reasons, because setrlimit isn't exactly a fast call - malloc is going to be far more optimized.

Not to mention you are talking about easily exceeding page sizes of data on the stack - which causes a page fault, making it *no faster than using malloc*

Quote:
Pthreads on Linux probably set the maximum stack size of threads using setrlimit(RLIMIT_STACK, ...).


Nope. Pthreads only affects the thread created with that attr you set.

Quote:
You may have a valid point here. However, if a program is large and/or complex enough to warrant using multiple threads, then memory allocation will have more subtle issues than deciding whether to store a large block of memory on the stack or on the heap. Most programs in Unix/Linux are single-threaded, though, so this doesn't become a factor for most programs.


Most programs in Unix/Linux are single-threaded? Bwahahahaha, no, no they aren't. Most programs are multi-threaded. The vast majority are multithreaded. Sure, those command line utils aren't, but there aren't that many of them.

Threading is *insanely* critical in modern systems, even for programs that don't appear threaded (as in, they don't use much CPU).

You *cannot* ignore threading, it is hugely important and continues to become more and more important as time goes on.

Quote:
You may have a point here too. Then again, gcc has the option -fstack-check which will cause a SIGSEGV upon stack overflow (which you can catch in a signal handler by using sigaltstack()).


You still have no way of gracefully handling that situation. You can either crash hard or crash with a nice error message, but you're pretty much hosed if you run out of stack, whereas in a surprising number of cases you can gracefully recover from not enough heap.

Quote:
That may be so (and I was aware that CPU caches are small), but some allocation sizes and memory access patterns may be faster by putting it on the heap, and some other configurations may be faster by putting it on the stack. If the process is accessing at least one byte in each page of a >=256K block of memory, it doesn't matter much whether it's on the stack or on the heap. The current stack page (the one that the stack pointer points to) is probably going to be flushed from the cache either way.


Ah, but you are missing something very important - the CPU knows about the stack. It knows where the stack pointer currently points. It is very easy for the CPU to just not flush the page with the stack from its cache. So yes, it very much does matter where you allocate memory in terms of the cache. And L1 cache is the important one here. L2 is like 3x slower than L1.

Quote:
After all is said and done, I would personally put up to a few kilobytes on the stack in each stack frame, but I don't worry if I occasionally put more than that.


And we come full circle - if you are putting that much data on your stack, your design is fundamentally wrong.

In the case of this thread, it absolutely seems like a case of "a little knowledge is a dangerous thing". Particularly you seem convinced that the stack is faster than the heap - which stops being true as soon as you start using the stack as a heap. Go do some actual testing, get some real numbers.
Kllrnohj wrote:
setrlimit is a POSIX thing, not necessarily a Linux thing. Linux isn't actually 100% POSIX compatible.

I like how you reject pthreads saying it was designed to support various implementations, then turn around and reference a POSIX man page. You realize pthreads is POSIX threads, right? Same standard.

Yes, I know it is the same standard. I was referring to the Linux-specific man page for setrlimit, which documents the behavior of this system call on Linux. Let's see what POSIX actually says about setrlimit:
setrlimit according to POSIX wrote:
RLIMIT_STACK
This is the maximum size of the initial thread's stack, in bytes. The implementation does not automatically grow the stack beyond this limit. If this limit is exceeded, SIGSEGV shall be generated for the thread. If the thread is blocking SIGSEGV, or the process is ignoring or catching SIGSEGV and has not made arrangements to use an alternate stack, the disposition of SIGSEGV shall be set to SIG_DFL before it is generated.

Even though Linux is not 100% POSIX compliant, this behavior is pretty fundamental in the way virtually any Unix system since the 1980's works, excluding flavors written for no-MMU systems.

Besides, if Linux were to allocate 8MB for the stack in every process, that would be extremely wasteful and slow. I currently have 320 processes (or 575 if you count threads individually). If each thread had a fixed 8MB stack, that would be 4.5GB set aside just for stacks! Do you really think there is actually that much stack space reserved on my system? I didn't think so. What happens in Linux is what I've said happens. The stack grows as a process needs it, up to the limits defined by setrlimit.

I'm not being self-contradictory here, either, regarding POSIX. Unlike for setrlimit, the standard does not specify that the stack grows for pthread_attr_setstacksize(), only that the function sets the minimum stack size. A thread can use at least up to the minimum size defined by this attribute, but an implementation would be free to dynamically grow the stack up to at least that size. I was being speculative about this function using setrlimit() in the Linux implementation, but it logically seems like it could.

Quote:
Quote:
A similar thing happens with the heap (where the upper limit is defined by brk()).


Uh, no. You have to ask for heap space. You can't just start using some random memory address and have the kernel just give it to you. Because it won't.

Um, yes. I had it right. Read the emphasized part again. brk() and sbrk() change the data segment size, which is "ask[ing] for heap space". Likewise, setrlimit(RLIMIT_STACK, ...) changes the maximum stack size. With both stack and data segment, the kernel doesn't actually allocate pages for either one until the process touches them. You could call sbrk(1024*1024*1024) to expand the data segment by 1GB, but no pages will be allocated until you touch those pages (through the magic of page faults).

Quote:
Quote:
Pthreads on Linux probably set the maximum stack size of threads using setrlimit(RLIMIT_STACK, ...).


Nope. Pthreads only affects the thread created with that attr you set.

Yeah, and pthreads on Linux could very well use setrlimit to set the maximum stack size for that thread upon its creation, since each thread is a process in Linux.

Quote:
Most programs in Unix/Linux are single-threaded? Bwahahahaha, no, no they aren't. Most programs are multi-threaded. The vast majority are multithreaded. Sure, those command line utils aren't, but there aren't that many of them.

Threading is *insanely* critical in modern systems, even for programs that don't appear threaded (as in, they don't use much CPU).

You *cannot* ignore threading, it is hugely important and continues to become more and more important as time goes on.

Of the 3843 programs that I have under /usr/bin, only 1385 (36%) are multi-threaded, so the majority are single-threaded. Even those programs which can run jobs in parallel, such as "make", are single-threaded. Since neither of us can provide meaningful statistics about all Unix/Linux programs as a whole, this point is mostly moot.

I agree that threading is important, but it's frankly not as important in the Unix world as you make it out to be. Forking is cheap in modern Unix, making it practical to use full processes in place of threads in many cases (there's little distinction between threads and processes, after all). Contrast that with Windows, where process creation is relatively expensive, so you see threads used much more heavily there*. That's one reason that Cygwin has poor performance for some tasks (eg, running a program inside a loop in the shell).

Quote:
Quote:
You may have a point here too. Then again, gcc has the option -fstack-check which will cause a SIGSEGV upon stack overflow (which you can catch in a signal handler by using sigaltstack()).


You still have no way of gracefully handling that situation. You can either crash hard or crash with a nice error message, but you're pretty much hosed if you run out of stack, whereas in a surprising number of cases you can gracefully recover from not enough heap.

That's true. But allocating even 16KB on the stack is not going to give you any problems (I do admit that allocating many megabytes on the stack at a time is silly for practical purposes). With the default 8MB maximum stack size, you'd have to nest your function calls about 500 times before you overflow the stack. If you're writing a recursive function, it'd be stupid to allocate that much space on the stack. With any other non-recursive function, it's no problem!

Quote:
In the case of this thread, it absolutely seems like a case of "a little knowledge is a dangerous thing". Particularly you seem convinced that the stack is faster than the heap - which stops being true as soon as you start using the stack as a heap. Go do some actual testing, get some real numbers.

I never said that allocating on the stack is faster than on the heap. I mentioned once that under some circumstances, it may be faster, but I don't believe that to be the general case. In the general case, they should be about the same speed.


* Incidentally, I'm currently writing a multi-threaded Windows service in C++ at work. It's several hundred lines long so far (it's more or less a prototype at this stage), but had I been writing it for Linux it would've been a much shorter script to do the same thing. I'd probably fork the script (using & in Bash, for example) to run a few different parts simultaneously (as "threads"), and also call external programs to do some of the work for me. Overall performance would be lower than in C++, but development time would also be much lower. Performance wouldn't really be that much of an issue anyway, since this (hypothetical) script could easily keep up with what it needs to do.
christop wrote:
Besides, if Linux were to allocate 8MB for the stack in every process, that would be extremely wasteful and slow. I currently have 320 processes (or 575 if you count threads individually). If each thread had a fixed 8MB stack, that would be 4.5GB set aside just for stacks! Do you really think there is actually that much stack space reserved on my system? I didn't think so. What happens in Linux is what I've said happens. The stack grows as a process needs it, up to the limits defined by setrlimit.


Absolutely. That's exactly what happens. It isn't using real memory, of course, but each one of those has an 8MB stack (except for the ones that changed it, of course).

Quote:
Um, yes. I had it right.


Sorry, I did misread that. I thought you were implying that the heap would automatically grow as you attempted to use it - which is not true.

Quote:
Of the 3843 programs that I have under /usr/bin, only 1385 (36%) are multi-threaded, so the majority are single-threaded.


And how did you arrive at those numbers?

Quote:
I agree that threading is important, but it's frankly not as important in the Unix world as you make it out to be. Forking is cheap in modern Unix, making it practical to use full processes in place of threads in many cases (there's little distinction between threads and processes, after all). Contrast that with Windows, where process creation is relatively expensive, so you see threads used much more heavily there*. That's one reason that Cygwin has poor performance for some tasks (eg, running a program inside a loop in the shell).


forking isn't actually cheap, it is just hidden cheap. The actual call to fork is fairly cheap (pretty much the same as spawning a thread). However, all of your initial memory writes are extra special slow, as they are copy-on-write. Threads are overall cheaper than forks by quite a bit (use less memory as well)

Quote:
That's true. But allocating even 16KB on the stack is not going to give you any problems (I do admit that allocating many megabytes on the stack at a time is silly for practical purposes). With the default 8MB maximum stack size, you'd have to nest your function calls about 500 times before you overflow the stack. If you're writing a recursive function, it'd be stupid to allocate that much space on the stack. With any other non-recursive function, it's no problem!


True, but it's a terrible habit to get into with zero benefit. Thus, don't do it.

Quote:
* Incidentally, I'm currently writing a multi-threaded Windows service in C++ at work. It's several hundred lines long so far (it's more or less a prototype at this stage), but had I been writing it for Linux it would've been a much shorter script to do the same thing. I'd probably fork the script (using & in Bash, for example) to run a few different parts simultaneously (as "threads"), and also call external programs to do some of the work for me. Overall performance would be lower than in C++, but development time would also be much lower. Performance wouldn't really be that much of an issue anyway, since this (hypothetical) script could easily keep up with what it needs to do.


Your problem there is C++ vs. scripting language, not Windows vs. Linux. Then again, why on earth are you writing C++ on Windows for something that isn't performance critical? Seriously, Win32 is one of the worst APIs known to man - why would you subject yourself to that torture when the bliss of C# is so close? Hell, if you can get away with a scripting language use Python. Or if by "service" you mean a web service, check out NodeJS - which I believe now has a Windows port.
Kllrnohj wrote:
Absolutely. That's exactly what happens. It isn't using real memory, of course, but each one of those has an 8MB stack (except for the ones that changed it, of course).

I think you and I agree on the mechanisms and end results but disagree on the exact meaning of "stack size". I say it means the peak usage of stack space (which obviously can expand during a process's lifetime), whereas you (I think) say it means the most stack space that the process can use. Am I right about that?

Kllrnohj wrote:

Quote:
Of the 3843 programs that I have under /usr/bin, only 1385 (36%) are multi-threaded, so the majority are single-threaded.


And how did you arrive at those numbers?


Code:
for f in /usr/bin/*; do if ld "$f" | grep -q libpthread; then echo "$f"; fi; done | wc -l

I made at least three assumptions here: that only pthreads are used, that linking to libpthread means the program is multi-threaded, and that none of the programs are statically-linked with pthreads. I think these are fairly safe assumptions, though.

Kllrnohj wrote:
Quote:
That's true. But allocating even 16KB on the stack is not going to give you any problems (I do admit that allocating many megabytes on the stack at a time is silly for practical purposes). With the default 8MB maximum stack size, you'd have to nest your function calls about 500 times before you overflow the stack. If you're writing a recursive function, it'd be stupid to allocate that much space on the stack. With any other non-recursive function, it's no problem!


True, but it's a terrible habit to get into with zero benefit. Thus, don't do it.

It has the benefit of allowing the programmer not to worry too much about where objects are being allocated. How often do you figure out how large each of your C++ classes (or third-party C++ classes) are before deciding whether to allocate them directly on the stack or to allocate them with "new"? Granted, a large class probably points to a flawed class implementation more than anything else, but I think my point still stands.

Kllrnohj wrote:
Quote:
* Incidentally, I'm currently writing a multi-threaded Windows service in C++ at work. It's several hundred lines long so far (it's more or less a prototype at this stage), but had I been writing it for Linux it would've been a much shorter script to do the same thing. I'd probably fork the script (using & in Bash, for example) to run a few different parts simultaneously (as "threads"), and also call external programs to do some of the work for me. Overall performance would be lower than in C++, but development time would also be much lower. Performance wouldn't really be that much of an issue anyway, since this (hypothetical) script could easily keep up with what it needs to do.


Your problem there is C++ vs. scripting language, not Windows vs. Linux.

I know it. But scripting feels more "natural" and more readily accessible in Linux than in Windows. Heck, even the system startup scripts in Linux are, well, scripts.

Quote:
Then again, why on earth are you writing C++ on Windows for something that isn't performance critical? Seriously, Win32 is one of the worst APIs known to man - why would you subject yourself to that torture when the bliss of C# is so close? Hell, if you can get away with a scripting language use Python. Or if by "service" you mean a web service, check out NodeJS - which I believe now has a Windows port.

I do mean a Windows service, not a web service. This service will be running on our client's Windows XP terminals which don't have .NET or any scripting facilities (besides, ugh, VBScript), and our service needs to be fully self-contained if possible for one reason or another. I even suggested using a scripting language like Python to the project manager, but he shot down that idea. For this project I just have to live with the awful Win32 API (it is terrible!).
christop wrote:
I think you and I agree on the mechanisms and end results but disagree on the exact meaning of "stack size". I say it means the peak usage of stack space (which obviously can expand during a process's lifetime), whereas you (I think) say it means the most stack space that the process can use. Am I right about that?


Yes.

Quote:
It has the benefit of allowing the programmer not to worry too much about where objects are being allocated. How often do you figure out how large each of your C++ classes (or third-party C++ classes) are before deciding whether to allocate them directly on the stack or to allocate them with "new"? Granted, a large class probably points to a flawed class implementation more than anything else, but I think my point still stands.


Or you could just allocate most everything on the heap and use auto_ptr, sidestepping the need to figure out how large a C++ class is entirely AND preventing stack abuse AND not need to worry about calling free.

It always amazes me how little people actually know about the C++ STL Sad

Quote:
I do mean a Windows service, not a web service. This service will be running on our client's Windows XP terminals which don't have .NET or any scripting facilities (besides, ugh, VBScript), and our service needs to be fully self-contained if possible for one reason or another. I even suggested using a scripting language like Python to the project manager, but he shot down that idea. For this project I just have to live with the awful Win32 API (it is terrible!).


Kllrnohj wrote:
Or you could just allocate most everything on the heap and use auto_ptr, sidestepping the need to figure out how large a C++ class is entirely AND preventing stack abuse AND not need to worry about calling free.

It always amazes me how little people actually know about the C++ STL Sad

To be honest, I didn't know about auto_ptr or pretty much any of the STL. I haven't used C++ hardly at all since I started learning it about 12 years ago, and my C++ college course a few years ago didn't touch on any STL functionality either. I don't think that course even mentioned classes (I also had to correct the instructor on a few other things about the language, such as the semantics of the comma operator Smile).

Quote:

Very classy.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement